Design and Implementation of Explanation Interfaces for Wikipedia's Machine Learning Service ORES
- Full-stack web-development experience or interest to learn it. No mobile app development.
- Data science experience: e.g., Jupyter notebooks, python, pandas, descriptive statistics.
- Using an API.
- Basic understanding or willingness to get into the following topics: Human-computer interaction (HCI) and human-centered design (HCD) process.
- Willingness to use best practices and frameworks for data science (reproducibility).
"Over 20 years, Wikipedia has become the largest collection of open knowledge in history" . Currently, Wikipedia is growing by 1.9 edits per second, and the English Wikipedia "includes 6,247,065 articles, and it averages 598 new articles per day" . Since Wikipedia has to deal with enormous amounts of data, Wikipedians call for automation to deal with content moderation, content curation, and the overall quality and consistency (e.g., counter-vandalism, task routing).
Wikipedia is a commons-based peer production platform, which means it is a socio-technical system where many individuals collaborate to reach the shared goal of knowledge creation .
An algorithmic system to support quality control in Wikipedia is, for example, the algorithmic scoring service, called ORES (online since 2015). The main goal of ORES is to provide machine-learning as a service for Wikimedia projects like Wikidata or Wikipedia in order to automate tasks like vandalism detection (reverting edits). ORES users are often developers who integrate ORES into their tools (e.g., gadgets, bots) and researchers. The ORES API provides pre-trained models (called "articlequality", "articletopic", "damaging", see ) for various Wikipedia projects (e.g., "en.wiki", "de.wiki") and predictions in real-time. 
The usage of such automated tools (e.g., automatically reverting edits, which had a high chance of being damaging) led to the fact that some people's contributions got rejected. Newcomers to Wikipedia were discouraged, so newcomer retention decreased. 
ORES is an exciting machine learning system cause it was developed by and for the Wikipedia community (also called participatory machine learning by Halfaker et al. ) with special care to algorithmic openness and transparency. Comparing ORES to other "user-generated content platforms like Facebook, Twitter, or YouTube" , we can inspect ORES machine-learning models on various levels. Still, ORES is an API, and only a few interfaces exist so far. Especially interfaces that explain specific ORES model predictions to end-users by, e.g., applying post-hoc interpretability methods , are missing so far. These explanations could address possible user questions like "Why was my edit automatically reverted?". Such explanations are essential to add fairness and transparency in machine learning.
The concrete objectives of the thesis can be manifold and depend on your interest. Accordingly, the objectives listed below are suggestions that would need to be adapted or refined depending on your interest in the topic:
- Design a user interface (explanation interfaces [see 9, 10] to make ORES predictions (of a specific model) more transparent to a specific user-group or stakeholders (e.g., end-users or developers).
- Implement explanation methods/algorithms (e.g., SHAP , LIME , InterpretML ) for different ORES model.
- Explore the concept of interactive explanations  for ORES.
- Read selected papers to get familiar with your field of research.
- Get familiar with one specific ORES model, e.g., "articlequality."
- Get familiar with a specific ORES user group, e.g., Wikipedia end-users writing and editing articles.
- Research explainability needs for this specific user group.
- Focus on this group of users and design and implement a user interface targeting that group and solving a specific user goal or task.
- Evaluate the interface by conducting usability tests.
- Get familiar with one (or more) ORES model, e.g., "articlequality."
- Read selected papers to get familiar with this research field and conduct a literature review in your specified field.
- Choose a specific user group you want to focus on.
- Redo parts of the study by  and adapt it to ORES.
- Make a comparison of different explanation methods for ORES.
- Evaluate the interface or conduct a study or experiment.
Both Bachelor and Master Students should follow an HCD process:
- Get confident with the HCD process; this will be the foundation of your workflow throughout your thesis.
- Use the HCD process and choose/use appropriate HCI methods to design, implement, and test your interface.
- Analysis: User research, data collection
- Design: Low and high-fidelity prototyping
- Evaluate: Usability study (Bachelor) or experiment (Master)
- Iterate: Include feedback and start with 1.
Then please, contact Alexa Schlegel, by just sending an email or book an appointment in her office hours.
Please check out the procedure of writing an exposé before booking an appointment.
Recent Bachelor or Master Thesis related to ORES
- An alternative confusion matrix visualization for PreCall
- Konzept und Implementierung einer visuellen Methode zur Verbesserung der Interpretierbarkeit der automatisierten Qualitätsbewertung mit ORES in Wikidata
- Pre Call: A Visual Interface for Threshold Optimization in Machine Learning Model Selection 
 https://ores-support-checklist.toolforge.org/, last accessed: 2021-02-11, 11:17 AM
 https://wikimediafoundation.org/wikipedia20/, last accessed: 2021-02-11, 10:05 AM
 https://en.wikipedia.org/wiki/Wikipedia:Statistics, last accessed: 2021-02-11, 10:09 AM
 Y. Benkler and H. Nissenbaum, “Commons-based Peer Production and Virtue,” J Political Philosophy, vol. 14, no. 4, pp. 394–419, Dec. 2006, doi: 10.1111/j.1467-9760.2006.00235.x.
 Halfaker, Aaron, and R. Stuart Geiger. "Ores: Lowering barriers with participatory machine learning in wikipedia." Proceedings of the ACM on Human-Computer Interaction 4.CSCW2 (2020): 1-37
 N. TeBlunthuis, A. Shaw, and B. M. Hill, “Revisiting ‘The Rise and Decline’ in a Population of Peer Production Projects,” in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, New York, NY, USA: Association for Computing Machinery, 2018, pp. 1–7.
 Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." Advances in neural information processing systems. 2017. https://github.com/slundberg/shap
 Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD. 2016; pp. 1135–1144. https://github.com/marcotcr/lime
 H. Nori, S. Jenkins, P. Koch, and R. Caruana, “InterpretML: A Unified Framework for Machine Learning Interpretability,” arXiv:1909.09223 [cs, stat], Sep. 2019, Accessed: Feb. 11, 2021. [Online]. Available: http://arxiv.org/abs/1909.09223.
 Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1–14. DOI:https://doi.org/10.1145/3313831.3376219
 Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1–15. DOI:https://doi.org/10.1145/3313831.3376590
 M. Chromik, “reSHAPe: A Framework for Interactive Explanations in XAI Based on SHAP,” 2020, doi: 10.18420/ecscw2020_p06.
 Example Interface for ORES: Kinkeldey, Christoph, et al. "Precall: A visual interface for threshold optimization in ml model selection." arXiv preprint arXiv:1907.05131 (2019).
 Carvalho, Diogo V., Eduardo M. Pereira, and Jaime S. Cardoso. "Machine learning interpretability: A survey on methods and metrics." Electronics 8.8 (2019): 832.