Lecture with Exercise: Human-Centered Data Science
(L: 19331101 E: 19331102)
Type | Lecture with Exercise |
---|---|
Instructor | Prof. Dr. Claudia Müller-Birn |
Homepage | |
Room | It takes place online only : Streaming → |
Start | Nov 02, 2020 |
end | Feb 23, 2021 |
Time | Lecture Monday 4 pm – 6 pm |
Content
In recent years, data science has developed rapidly, primarily due to the progress in machine learning. This development has opened up new opportunities in a variety of social, scientific, and technological areas. From the experience of recent years, however, it is becoming increasingly clear that the concentration on purely statistical and numerical aspects in data science fails to capture social nuances or take ethical criteria into account. The research area Human-Centered Data Science closes this gap at the intersection of Human-Computer Interaction (HCI), Computer-Supported Cooperative Work (CSCW), Human Computation, and the statistical and numerical techniques of Data Science.
Human-Centered Data Science (HCDS) focuses on fundamental principles of data science and its human implications, including research ethics; data privacy; legal frameworks; algorithmic bias, transparency, fairness, and accountability; data provenance, curation, preservation, and reproducibility; user experience design and research for big data; human computation; effective oral, written, and visual scientific communication; and societal impacts of data science.
At the end of this course, students will understand the main concepts, theories, practices, and different perspectives on which data can be collected and manipulated. Furthermore, students will be able to realize the impact of current technological developments may have on society.
This course curriculum was initially developed by Jonathan T. Morgan, Cecilia Aragon, Os Keyes, and Brock Craft. We have adapted the curriculum for the European context and our specific understanding of the field.
Here you can find our Code of Conduct.
Literature
Aragon, C. M., Hutto, C., Echenique, A., Fiore-Gartland, B., Huang, Y., Kim, J., et al. (2016). Developing a Research Agenda for Human-Centered Data Science. (pp. 529–535). Presented at the CSCW Companion, New York, New York, USA: ACM Press. http://doi.org/10.1145/2818052.2855518
Baumer, E. P. (2017). Toward human-centered algorithm design:. Big Data & Society, 4(2), 205395171771885. http://doi.org/10.1177/2053951717718854
Kogan, M., Halfaker, A., Guha, S., Aragon, C., Muller, M., & Geiger, S. (2020). Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work, (pp. 151-156). https://doi.org/10.1145/3323994.3369898
Schedule
01 | 02.11.2020 - Introduction to Human-Centered Data Science02 | 09.11.2020 - Reproducibility of Data Science PracticeRessources
Human-Centered Data Science
Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a Research Agenda for Human-Centered Data Science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (CSCW '16 Companion). Association for Computing Machinery, New York, NY, USA, 529–535. DOI: https://doi.org/10.1145/2818052.2855518
Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work (GROUP ’20). Association for Computing Machinery, New York, NY, USA, 151–156. DOI: https://doi.org/10.1145/3323994.3369898
Human-Centered System Design
Rob Kling and Susan Leigh Star. 1998. Human centered systems in the perspective of organizational and social informatics. SIGCAS Comput. Soc. 28, 1 (March 1998), 22–29. DOI:https://doi.org/10.1145/277351.277356
Further Reading
Experiences of running data science workshops: Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). Democratizing Data Science: The Community Data Science Workshops and Classes. In N. Jullien, S. A. Matei, & S. P. Goggins (Eds.), Big Data Factories: Scientific Collaborative approaches for virtual community data collection, repurposing, recombining, and dissemination. New York, New York: Springer Nature. https://doi.org/10.1007/978-3-319-59186-5_9
Tim Harford. 2014. Big data: A big mistake? Significance, 11(5), 14–19. http://doi.org/10.1111/j.1740-9713.2014.00778.x
Misc
Peter Bull discusses the importance of human-centered design in data science. https://www.datacamp.com/community/blog/human-centered-design-data-science
Recap your ML knowledge with this course "Machine Learning in a Nutshell" https://web2.qatar.cmu.edu/~gdicaro/15488/
03 | 16.11.2020 - Sources of Bias - Approaches to Identify, Mitigate and AvoidRessources
Reproducibility
Chapter 2 "Assessing Reproducibility" and Chapter 3 "The Basic Reproducible Workflow Template" from The Practice of Reproducible Research University of California Press, 2018.
Sharing Jupyter Notebooks https://reproducible-science-curriculum.github.io/sharing-RR-Jupyter/
Further Reading
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
Christensen, Garret. Manual of Best Practices in Transparent Social Science Research. 2016. (Chapter 6)
Press, Gil. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 2016.
Further Examples of replication study
Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 57(5), 664-688
TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18). https://doi.org/10.1145/3173574.3173929
04 | 23.11.2020 - Beyond a Statistical Concept - Dealing with the Complexity of FairnessRessources
Bias
Friedman, B., & Nissenbaum, H. (1996). Bias in Computer Systems. ACM Trans. Inf. Syst., 14(3), 330–347.
Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E., & Kiciman, E. (2019). Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data, 2, 13. https://doi.org/10.3389/fdata.2019.00013
Further Reading
Suresh, H., & Guttag, J. V. (2019). A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.
Misc
05 | 30.11.2020 - Transparency or how to achieve Intrinsic Interpretability?
06 | 07.12.2020 - Post-hoc Interpretability - Limiting Interpretability by Focussing on ExpertsRessources
Transparency and beyond
- Kohli, N., Barreto, R., & Kroll, J. A. (2018). Translation tutorial: a shared lexicon for research and practice in human-centered software systems. In 1st Conference on Fairness, Accountability, and Transparancy. New York, NY, USA.
- Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/.
- Carvalho, Diogo V., Eduardo M. Pereira, and Jaime S. Cardoso. "Machine learning interpretability: A survey on methods and metrics." Electronics 8.8 (2019): 832.
- Doshi-Velez, Finale, and Been Kim. "Towards a rigorous science of interpretable machine learning." arXiv preprint arXiv:1702.08608 (2017).
- Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Vaughan, J. W., & Wallach, H. (2018). Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810. (Video)
Further Readings
- Walmsley, Joel. "Artificial intelligence and the value of transparency." AI & SOCIETY (2020): 1-11. https://link.springer.com/article/10.1007/s00146-020-01066-z
Ressources
Overviews on Interpretability/Explanations
- Miller, Tim. "Explanation in artificial intelligence: Insights from the social sciences." Artificial Intelligence 267 (2019): 1-38.
- Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018, October). Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) (pp. 80-89). IEEE.
- Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Chatila, R. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115. (check out Figure 6 in this paper)
Further Ressources
- Bias-variance tradeoff https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
- Anderson, Carl. The role of model interpretability in data science. Medium, 2016.
Explanantion Methods
- LIME | Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD. 1135–1144. (github)
- SHAP | Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." Advances in neural information processing systems. 2017. (github)
- Rich Caruana, Harsha Nori, Samuel Jenkins, Paul Koch, Ester de Nicolas. 2019. InterpretML software toolkit (github repo, blog post)
21.12.2020 - No Class 04.01.2021 - No Class 08 | 11.01.2021 - Enhancing Interpretability through Visual AnalyticsRessources
Designing Human-Centered Explanantions
- Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1–14. DOI:https://doi.org/10.1145/3313831.3376219
- Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1–15. DOI:https://doi.org/10.1145/3313831.3376590
Further Reading
- Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In <i>Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Paper 582, 1–18. DOI:https://doi.org/10.1145/3173574.3174156
- Kacper Sokol and Peter Flach. 2020. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20). Association for Computing Machinery, New York, NY, USA, 56–67. DOI:https://doi.org/10.1145/3351095.3372870
Further Ressources
- Python library Alibi https://docs.seldon.io/projects/alibi/en/stable/index.html
- DALEX R package https://uc-r.github.io/dalex
18.01.2021 - No Class 09 | 25.01.2021 - Künstliche Intelligenz und die Mensch-Maschine-Interaktion aus ethischer Sicht Ein utopischer (und dystopischer) Blick in die Zukunft der Patientenversorgung (Guest Lecture by Jun.-Prof. Dr. Susanne Michl)Ressources
- Baumer, E. P. S. (2017). Toward human-centered algorithm design. Big Data & Society.
- Chatzimparmpas, A., Martins, R. M., Jusufi, I., Kucher, K., Rossi, F., & Kerren, A. (2020). The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations. In Computer graphics forum.
Overview available at https://trustmlvis.lnu.se/Tool Videos
Further interesting approaches
- MLJar. https://mljar.com/.
- Cloud AutoML. https://cloud.google.com/automl.
- CNN Explainer: https://github.com/DLReseach/cnn-explainer
10 | 01.02.2021 - Privacy Preserving Machine Learning Threats And Solutions (Guest Lecture by Franziska Boenisch)More information in our news.
Ressources
Further Reading
- Ben Shneiderman. 2020. Bridging the Gap Between Ethics and Practice: Guidelines for Reliable, Safe, and Trustworthy Human-centered AI Systems. ACM Trans. Interact. Intell. Syst. 10, 4, Article 26 (December 2020), 31 pages. DOI:https://doi.org/10.1145/3419764
- Shahriari, K., & Shahriari, M. (2017). IEEE standard review - Ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems. Institute of Electrical and Electronics Engineers
- Barocas, S., & Boyd, D. (2017). Engaging the ethics of data science in practice. Communications of the ACM, 60(11), 23–25. https://doi.org/10.1145/3144172
11 | 08.02.2021 - Using Thick Data in Data Science 12 | 15.02.2021 - Algorithmic Accountability - Taking Responsibility 13 | 22.02.2021 - ExamMore information in our news.
Ressources
Further Reading
- Javier Salido. 2012. Differential Privacy for Everyone. Microsoft Corporation Whitepaper.
- CACM Staff. 2021. Differential privacy: the pursuit of protections by default. Commun. ACM 64, 2 (February 2021), 36–43. DOI:https://doi.org/10.1145/3434228
- Wood, Alexandra, et al. "Differential privacy: A primer for a non-technical audience." Vand. J. Ent. & Tech. L. 21 (2018): 209. https://dash.harvard.edu/bitstream/handle/1/38323292/4_Wood_Final.pdf?sequence=1
- Papernot, Nicolas. "A Marauder's map of security and privacy in machine learning: an overview of current and future research directions for making machine learning secure and private." Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security. 2018.