Natural Language Processing to support the analysis of biographical interviews
- Python (sklearn, numpy etc.)
The Center for Digital Systems (CeDis) of Freie Universität Berlin maintains a number of collections of biographical interviews. Specifically, a new collection of interviews in which Alumni are interviewed about their experience at FU is being made available at the moment ]. In this porcess the collection is digitalized, manually annotated and provided with metadata to make it searchable. This can possibly be supported with methods from natural language processing (NLP), a field in compute science that deals with problems of analyzing natural language. We will use an unsupervised machine learning pipeline based on neural language models (e.g. BERT ) to provide an overview of the topics discussed in the interviews. This allows viewing the collection from a global perspective and can support the content-based search as well as find previously unnoticed relations between interviews.
The bachelor's thesis has the goal to examine the suitability of this technology from a technical, as well as from the perspective of human-computer interaction. To this end the pipeline will be optimized for this specific use case and a user interface prototype will be developed that allows historians to view and search the interview collection.
- Describe and Study Use Case and Stakeholders
- Formulate Scenario that guides prototype development
- Understand data
- optimize model(s)
- evaluate model
- integrate model in prototype
- perform user-tests of prototype
 Projekt erlebte Geschichte (https://www.fu-berlin.de/sites/erlebte-geschichte/projekt/index.html)
 Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.