Maximilian Stauss:

Conceptualization and Evaluation of Idea Similarities based on Semantic Enrichment & Knowledge Graphs

Semantic Technologies, Information Extraction
Master of Science (M.Sc.)



Semantic Annotation of text snippets is subject to a major challenge called "Word Sense Disambiguation" (1). One way to mitigate this problem is to employ crowd-worker to annotate text. This approach changes the focus from information extraction metrics (namely precision and recall (2)) towards multi-step quality metrics. Examples are: The time needed to annotate a text, the recommended concepts for the annotation, the click-effort needed by the workers to validate a concept, the average quality of annotations by crowd-workers.


In classic information extraction, quality metrics are well defined and tested. Introducing a interactive component into the pipeline calls for new ways of validation of changes introduced into different parts of the pipeline, to better reason about algorithmic and design choices. Using the application case of Interactive Concept Validation (3) this work proposes possible improvements, metrics for different steps and a experimental validation of the impact of the implementation of selected features.


  • Analyze the existing ICV as an example for a mixed initiative information extraction pipeline
  • Propose enhancements
  • Collect metrics for the enhancements:
    • Operationalization of human effort
    • Measurable tradeoff: Quality/Time spent/Costs
  • Conduct crowd-sourcing to validate the impact of the features


