Interactive Visualization Interface for Text Exploration and Annotation
- VL Human-Computer Interaction 1 or VL Data Visualization
- Preferred: Completion of the lecture on "User-Centered Design"/"Human-Computer Interaction I" and the lecture on "Wissenschaftliches Arbeiten in der Informatik"
- Proficiency in German & English
The exploration and annotation (or labeling) of large-scale text corpora is a complex and cumbersome task (Ruppert et al., 2017). Dimensionality reduction (DR) is therefore frequently used for analyzing and visualizing these high-dimensional data. However, even though the DR provides a good first overview of the data, the interpretation of the visualization is challenging. Clustering techniques can help to group topics into groups to provide a better overview. However, effective support to understand cluster characteristics is still rare. Recent work has explored possible interaction approaches with DR visualizations (e.g., DR parameter tuning), which can be used as a starting point in this work (Sacha et al., 2016). We would like to extend this line of work, with the emerging area of ML interpretability (Legg et al., 2019). By providing explanations to the analyst, that explains the DR parameter tuning opportunities but also, for example, the impact of the used clustering method on the visualization, helps to reflect on the impact of these explanations on understanding the DR results. However, what explanations are useful in this area of research remains unclear.
In the context of a current project at the HCC, a text analysis pipeline has been build that uses UMAP as DR method and k-medoids as clustering method. This set-up shall be used by non-technical experts for specifying topics in a text corpora. The questions is, how this process of annotating clusters of text can be supported by explanations. These explanation are used to support users in their annotation task.Objectives
This BSc. thesis aims to provide a first interactive visualization interface, that allows users to interactively explore shared topics of a text corpora. This prototype shall especially help to determine existing explanation needs of users. A subset of these explanation needs should already be realized in suitable explanations.
- Use the UMAP explorer (https://github.com/GrantCuster/umap-explorer) and select existing interaction approaches from literature provide a first interactive visualization interface. For example, allow users to explore the clustering results (e.g. contained documents, relevant feature terms), and quality measures.
- Conduct interview studies (e.g. contextual inquiry) for determining existing explanation needs and translate them into explanation.
- Extend the interactive visualization interface by additional explanation.
- Validate the usefulness of the interactive visualization interface with a usage scenario describing how users can explore document collections in a visual and interactive way.
 Legg, P., Smith, J., & Downing, A. (2019). Visual analytics for collaborative human-machine confidence in human-centric active learning tasks. Human-centric Computing and Information Sciences, 9(1), 1-25.
 Ruppert, T., Staab, M., Bannach, A., Lücke-Tieke, H., Bernard, J., Kuijper, A., & Kohlhammer, J. (2017). Visual interactive creation and validation of text clustering workflows to explore document collections. Electronic Imaging, 2017(1), 46-57.
 Sacha, D., Zhang, L., Sedlmair, M., Lee, J. A., Peltonen, J., Weiskopf, D., ... & Keim, D. A. (2016). Visual interaction with dimensionality reduction: A structured literature analysis. IEEE Transactions on Visualization and Computer Graphics, 23(1), 241-250.