Survey of Named Entity Recognition Tools & APIs

Academic Advisor: Jonas Oppenländer
Discipline: Named Entity Recognition (NER), Entity Extraction, Natural Language Processing (NLP), information extraction, information retrieval
Degree: Bachelor of Science (B.Sc.) or Master of Science (M.Sc.)

Requirements:

  • Knowledge in a programming language
  • Knowledge in using web APIs

Contents

Context

Named Entity Recognition (NER) is the practice of identifying entities in unstructured text [1]. NER is also referred to as entity extraction.
A variety of tools and APIs are available to support the task of extracting entities from text [2].

Content

This work should produce a comprehensive survey of entity extraction tools and APIs. A subset of tools and APIs should be selected for further study.

To test the APIs and tools, an extensible software should be created that allows the execution of an API call for each of the surveyed tools and APIs.
The features and results of the surveyed tools and APIs should be compared in a structured and consistent manner.

Proposed Procedure

  • Consultation of search engines and study of related scientific literature

  • Selection of a subset of the surveyed NER tools and APIs to investigate further

  • Development of an extensible software to test each tool with a sample text. A suitable software pattern is the adapter pattern with a common interface.

  • Testing each tool and API with the software and collecting the results

  • Comparison of results

M.Sc. students will need to think about ways of detecting and reducing false positives and false negatives in the results, e.g. by involving the crowd [3].

Please contact Jonas Oppenländer (firstname.lastname@fu-berlin.de), Königin-Luise-Str. 24-26, room 115, for further information.

References

[1] Marrero, M., Urbano, J., Sánchez-Cuadrado, S., Morato, J., Gómez-Berbís, J.M. (2013): Named Entity Recognition: Fallacies, challenges and opportunities. Computer Standards & InterfacesVolume 35, Issue 5, , pp.482-489.

[2] Nadeau, D., Sekine, S. (2007): A survey of named entity recognition and classification. Lingvisticæ InvestigationesVolume 30, Issue 1, pp. 3 –26.

[3] Braunschweig, K., Thiele, M., Eberius, J., Lehner, W. (): Enhancing Named Entity Extraction by Effectively Incorporating the Crowd. BTW Workshops, 13, pp. 181-195.