Springe direkt zu Inhalt

Natural Language Processing to support the analysis of biographical interviews

Requirements

  • Python (sklearn, numpy etc.)
  • NLP
Academic Advisor
Discipline
HCI, Machine Learning, Natural Language Processing, History
Degree
Bachelor of Science (B.Sc.)

Contents

The Center for Digital Systems (CeDis) of Freie Universität Berlin maintains a number of collections of biographical interviews. Specifically, a new collection of interviews in which Alumni are interviewed about their experience at FU is being made available at the moment ][1]. In this porcess the collection is digitalized, manually annotated and provided with metadata to make it searchable. This can possibly be supported with methods from natural language processing (NLP), a field in compute science that deals with problems of analyzing natural language. We will use an unsupervised machine learning pipeline based on neural language models (e.g. BERT [2]) to provide an overview of the topics discussed in the interviews. This allows viewing the collection from a global perspective and can support the content-based search as well as find previously unnoticed relations between interviews. 

The bachelor's thesis has the goal to examine the suitability of this technology from a technical, as well as from the perspective of human-computer interaction. To this end the pipeline will be optimized for this specific use case and a user interface prototype will be developed that allows historians to view and search the interview collection.

Possible Procedure

  • Describe and Study Use Case and Stakeholders
  • Formulate Scenario that guides prototype development
  • Understand data
  • optimize model(s)
  • evaluate model
  • integrate model in prototype
  • perform user-tests of prototype

References

[1] Projekt erlebte Geschichte (https://www.fu-berlin.de/sites/erlebte-geschichte/projekt/index.html)

[2] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.