Springe direkt zu Inhalt

Luka Stärk:

Semantic Similarity of Concepts for a Human-Centered Idea Recommendation Feature in the Clustering Application Orchard


  • Semantic Web
Academic Advisor
Ideation, Human-Computer Interaction, Semantic Similarity, Knowledge Graphs
Bachelor of Science (B.Sc.)



The research project Ideas2Market explores the innovation process for applications of new technologies. A central task is to generate many ideas, to cover most possible solutions on how to apply the technology. This procedure is implemented using collaborative innovation approaches to crowd-source ideas. These ideas are not yet fully evolved and considered to be on a brainstorming level, in the following they will be referred to as idea sparks. Nevertheless, these idea sparks introduce great variety and creative value because they are created by different persons with diverse backgrounds. Still, finding valuable idea sparks has proven challenging and due to their large number, it becomes unfeasible to check every spark idea manually and to derive benefits from them for advanced ideas. These ideas are evolved by experts in the further process and then become refined and transformed into product opportunities to deploy onto the market as the last step. The project Ideas2Market aims to solve these problems with software support and by researching the human needs in creative processes. The software supported collaborative-ideation process can be described in three phases:

  1. Divergent Phase
  2. Clustering Phase
  3. Convergent Phase

When clustering, the categories are not always clear to us. The decision of creating a cluster is based on feeling and intuition and can be reversed any time. During the process an ordering emerges and the relationships between idea sparks become more visible, so fare the theory. This process is beneficial as an activity in acquiring a more profound understanding of the idea-space [Siangliulue et al., 2016] and producing more valuable ideas in the Convergent Phase. For growing numbers of idea sparks, it becomes more challenging to organize the idea-space and to take into account all potential idea sparks for one cluster. This task can then be monotonous and time-consuming. This thesis is about counteracting this problem and increasing efficiency in the clustering process with a recommender system that proposes idea sparks based on the selection of content, that can either be a spark idea or single concepts.


The goal is to improve the clustering process in the Orchard application with a recommender system. Therefore, concept similarity is calculated for all concepts extracted from 60 idea sparks. The measure of similarity for concepts is Knowledge Graph-based and calculated through the metric wpath of Zhu and Iglesias [2017]. Given the measures of similarity between concepts, the similarities between concepts and ideas, and the Word Mover’s Distance between ideas are calculated, these are the measurements needed to integrate the recommender system into the Orchard clustering application.


The Procedure is divided into three substantial parts:

  • implementation of semantic similarity methods
  • validation of the semantic similarity measurements with five datasets of human word similarity assessments
  • integration of the recommendation feature into the Orchard application
  • validation of the recommendation feature in a user study with five participants


(1) Halfaker, A., & Geiger, R. S. (2019). ORES - Lowering Barriers with Participatory Machine Learning in Wikipedia. CoRR, https://arxiv.org/abs/1909.05189

(2) Kinkeldey, Christoph, Claudia Müller-Birn, Tom Gülenman, Jesse Josua Benjamin, and Aaron Halfaker. “PreCall: A Visual Interface for Threshold Optimization in ML Model Selection.” ArXiv:1907.05131 [Cs], July 11, 2019. http://arxiv.org/abs/1907.05131.

(3) Beauxis-Aussalet, Emma, Joost van Doorn, and Lynda Hardman. “Supporting End-User Understanding of Classification Errors,” 2018. https://doi.org/10.1145/3232078.3232096.


(4) http://classee.project.cwi.nl/

(5) https://github.com/tguelenman/PreCall

(6) https://www.mediawiki.org/wiki/ORES/Applications

(7) https://www.mediawiki.org/wiki/ORES/Thresholds