Implementing a conversational agent for data science assistance in jupyter notebooks

Requirements

Successful completion of the Human-Computer Interaction or Human-Centered Data Science course
Python and JavaScript programming knowledge
Knowledge of Juptyter Notebooks
Solid data science knowledge

Academic Advisor

Lars Sipos

Discipline

Data Science, HCI, Data Analysis, Data Visualization

Degree

Master of Science (M.Sc.)

In the ENKIS project, we are concerned with establishing a responsible usage of AI technologies and incorporating it into CS-related study programs at the Freie Universität Berlin. Thus, we have a pronounced focus on fostering 'critical reflection' among future professionals, scientists, and non-technical experts regarding existing approaches, limitations, and potentials in AI.

However, data science is challenging since complex concepts must be understood and executed. In recent years, more and more people from different fields have become interested in the advantages data science provides, like finding meaningful information in large amounts of data for making informed decisions. However, to practice data science currently, a lot of programming and mathematical knowledge is required, which makes it difficult for non-experts in these areas to engage with it. A possible solution is the abstraction of complex tasks using natural language commands. Conversational agents are programs that can understand and execute these commands and afterward can communicate the results with the user in an understandable manner during a conversation. There exists a plethora of conversational agents for a variety of problem domains [1][2][3][4]. The IRIS conversational agent is particularly interesting to this thesis, which can perform open-ended complex data science tasks by combining commands through nested conversations [5]. While the research of Fast et al. on the conversational agent shows that data scientists can perform comparable tasks faster using IRIS, the problem remains that these agents don't reside where the data scientists are primarily working inside Jupyter Notebooks [6]. This situation limits the conversational agents' use cases, as the data processing and analysis are happening in different places. Moreover, the transfer of resources between these two makes it cumbersome for the users. Therefore, the conversational agents should be built to neatly integrate into the data science workflow and not require the users to switch contexts while working on the same task.

Another problem is the widespread use of black-box models in the data science practice, which make it difficult for the practitioners and the people affected by the model's decisions to comprehend the predictions. Using assistance tools, like conversational agents, it becomes easier to deploy them by abstracting the task of building these complex models from scratch. The recent emergence of Explainable AI (XAI) methods tries to mitigate this by explaining how the models reach their predictions. However, research showed that even data science experts have trouble comprehending these explanations [7]. Conversational agents have the advantage that they can engage with the users to try to find the best explanations for them and the context they are working in. They also slow the process down to allow the practitioners to critically reflect on what they are trying to achieve or encourage them to do that, which may lead to a more responsible data science practice.

This thesis aims to help data science practitioners (or those who wish to be) in their practice, often done inside Jupyter Notebooks. The support should be achieved using a purpose-built conversational agent. For this thesis, a conversational agent should be implemented that can directly interact with the user where the data science work takes place, inside the notebook itself. The agent should help users with data science and machine learning tasks, similar to the IRIS conversational agent [5]. Furthermore, explanations should be provided for the used ML models. Finally, practitioners interacting with the agent should be encouraged to establish a reflective and responsible practice. Using reflection prompts, users should slow down and reflect on their actions and results and consider their impacts on people directly or indirectly affected by them.

Port the IRIS conversational agent [5] to a Jupyter Notebook extension or create a new conversational agent based on it.
Allow users to interact with the conversational agent using the notebooks' cells (like seen in this YouTube video [8]) or by other appropriate means, e.g., an additional interface.
Research XAI methods and extend the conversational agent with the ability to generate explanations for model predictions.
Research reflection prompts and expands the conversational agent's capabilities by encouraging the users to reflect on their practice.
Evaluate the conversational agent prototype in a user study.

References

[1] Hauswald, Johann, et al. "Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers." Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. 2015. 223–238. https://doi.org/10.1145/2775054.2694347

[2] John, Rogers J. L. et al. “Ava: From Data to Insights Through Conversations.” Conference on Innovative Data Systems Research, 2017.

[3] Bohus, Dan and Rudnicky, Alexander I. 2009. The RavenClaw dialog management framework: Architecture and systems. Comput. Speech Lang. 23, 3 (July, 2009), 332–361.

[4] Allen, James, et al. PLOW: a collaborative task learning agent. In Proceedings of the 22nd national conference on Artificial intelligence - Volume 2 (AAAI'07). AAAI Press. 2007. 1514–1519.

[5] Fast, Ethan, et al. "Iris: A conversational agent for complex tasks." Proceedings of the 2018 CHI conference on human factors in computing systems. 2018. 1–12. https://doi.org/10.1145/3173574.3174047

[6] https://jupyter.org/

[7] Kaur, Harmanpreet, et al. “Interpreting Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine Learning,” in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu HI USA, Apr. 2020. 1–14. doi: 10.1145/3313831.3376219.

[8] AI ChatBot In Python Using Jupyter Notebook | Artificial Intelligence Projects | Code With Dhruv https://www.youtube.com/watch?v=fN225DZKk-c

[9] Jupyter Notebook extensions https://github.com/ipython-contrib/jupyter_contrib_nbextensions

Human-Centered Computing