Lecture with Exercise: Human-Centered Data Science

(L: 19331101 E: 19331102)

Type	Lecture with Exercise
Instructor	Prof. Dr. Claudia Müller-Birn
Homepage	All course material is available on Github. Lecture Videos are hosted on vBrick (participants only)
Room	It takes place online only : Streaming →
Start	Nov 02, 2020
end	Feb 23, 2021
Time	Lecture Monday 4 pm – 6 pm Exercise Tuesday 4 pm – 6 pm

Content

In recent years, data science has developed rapidly, primarily due to the progress in machine learning. This development has opened up new opportunities in a variety of social, scientific, and technological areas. From the experience of recent years, however, it is becoming increasingly clear that the concentration on purely statistical and numerical aspects in data science fails to capture social nuances or take ethical criteria into account. The research area Human-Centered Data Science closes this gap at the intersection of Human-Computer Interaction (HCI), Computer-Supported Cooperative Work (CSCW), Human Computation, and the statistical and numerical techniques of Data Science.

Human-Centered Data Science (HCDS) focuses on fundamental principles of data science and its human implications, including research ethics; data privacy; legal frameworks; algorithmic bias, transparency, fairness, and accountability; data provenance, curation, preservation, and reproducibility; user experience design and research for big data; human computation; effective oral, written, and visual scientific communication; and societal impacts of data science.

At the end of this course, students will understand the main concepts, theories, practices, and different perspectives on which data can be collected and manipulated. Furthermore, students will be able to realize the impact of current technological developments may have on society.

This course curriculum was initially developed by Jonathan T. Morgan, Cecilia Aragon, Os Keyes, and Brock Craft. We have adapted the curriculum for the European context and our specific understanding of the field.

Here you can find our Code of Conduct.

Literature

Aragon, C. M., Hutto, C., Echenique, A., Fiore-Gartland, B., Huang, Y., Kim, J., et al. (2016). Developing a Research Agenda for Human-Centered Data Science. (pp. 529–535). Presented at the CSCW Companion, New York, New York, USA: ACM Press. http://doi.org/10.1145/2818052.2855518

Baumer, E. P. (2017). Toward human-centered algorithm design:. Big Data & Society, 4(2), 205395171771885. http://doi.org/10.1177/2053951717718854

Kogan, M., Halfaker, A., Guha, S., Aragon, C., Muller, M., & Geiger, S. (2020). Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work, (pp. 151-156). https://doi.org/10.1145/3323994.3369898

Schedule

01 | 02.11.2020 - Introduction to Human-Centered Data Science

Ressources

Human-Centered Data Science

Cecilia Aragon, Clayton Hutto, Andy Echenique, Brittany Fiore-Gartland, Yun Huang, Jinyoung Kim, Gina Neff, Wanli Xing, and Joseph Bayer. 2016. Developing a Research Agenda for Human-Centered Data Science. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion (CSCW '16 Companion). Association for Computing Machinery, New York, NY, USA, 529–535. DOI: https://doi.org/10.1145/2818052.2855518

Marina Kogan, Aaron Halfaker, Shion Guha, Cecilia Aragon, Michael Muller, and Stuart Geiger. 2020. Mapping Out Human-Centered Data Science: Methods, Approaches, and Best Practices. In Companion of the 2020 ACM International Conference on Supporting Group Work (GROUP ’20). Association for Computing Machinery, New York, NY, USA, 151–156. DOI: https://doi.org/10.1145/3323994.3369898

Human-Centered System Design

Rob Kling and Susan Leigh Star. 1998. Human centered systems in the perspective of organizational and social informatics. SIGCAS Comput. Soc. 28, 1 (March 1998), 22–29. DOI:https://doi.org/10.1145/277351.277356

Further Reading

Experiences of running data science workshops: Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). Democratizing Data Science: The Community Data Science Workshops and Classes. In N. Jullien, S. A. Matei, & S. P. Goggins (Eds.), Big Data Factories: Scientific Collaborative approaches for virtual community data collection, repurposing, recombining, and dissemination. New York, New York: Springer Nature. https://doi.org/10.1007/978-3-319-59186-5_9

Tim Harford. 2014. Big data: A big mistake? Significance, 11(5), 14–19. http://doi.org/10.1111/j.1740-9713.2014.00778.x

Misc

Peter Bull discusses the importance of human-centered design in data science. https://www.datacamp.com/community/blog/human-centered-design-data-science

Recap your ML knowledge with this course "Machine Learning in a Nutshell" https://web2.qatar.cmu.edu/~gdicaro/15488/

02 | 09.11.2020 - Reproducibility of Data Science Practice

Ressources

Reproducibility

Chapter 2 "Assessing Reproducibility" and Chapter 3 "The Basic Reproducible Workflow Template" from The Practice of Reproducible Research University of California Press, 2018.

Sharing Jupyter Notebooks https://reproducible-science-curriculum.github.io/sharing-RR-Jupyter/

Further Reading

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.

Christensen, Garret. Manual of Best Practices in Transparent Social Science Research. 2016. (Chapter 6)

Press, Gil. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 2016.

Further Examples of replication study

Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 57(5), 664-688

TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18). https://doi.org/10.1145/3173574.3173929

03 | 16.11.2020 - Sources of Bias - Approaches to Identify, Mitigate and Avoid

Ressources

Bias

Friedman, B., & Nissenbaum, H. (1996). Bias in Computer Systems. ACM Trans. Inf. Syst., 14(3), 330–347.

Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E., & Kiciman, E. (2019). Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data, 2, 13. https://doi.org/10.3389/fdata.2019.00013

Further Reading

Suresh, H., & Guttag, J. V. (2019). A framework for understanding unintended consequences of machine learning. arXiv preprint arXiv:1901.10002.

Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.

Bender, E. M., & Friedman, B. (2018). Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6, 587-604.

Misc

The Trouble with Bias - NIPS 2017 Keynote - Kate Crawford #NIPS2017 (17:35)

04 | 23.11.2020 - Beyond a Statistical Concept - Dealing with the Complexity of Fairness
05 | 30.11.2020 - Transparency or how to achieve Intrinsic Interpretability?

Ressources

Transparency and beyond

Kohli, N., Barreto, R., & Kroll, J. A. (2018). Translation tutorial: a shared lexicon for research and practice in human-centered software systems. In 1st Conference on Fairness, Accountability, and Transparancy. New York, NY, USA.

Molnar, Christoph. "Interpretable machine learning. A Guide for Making Black Box Models Explainable", 2019. https://christophm.github.io/interpretable-ml-book/.

Carvalho, Diogo V., Eduardo M. Pereira, and Jaime S. Cardoso. "Machine learning interpretability: A survey on methods and metrics." Electronics 8.8 (2019): 832.

Doshi-Velez, Finale, and Been Kim. "Towards a rigorous science of interpretable machine learning." arXiv preprint arXiv:1702.08608 (2017).

Poursabzi-Sangdeh, F., Goldstein, D. G., Hofman, J. M., Vaughan, J. W., & Wallach, H. (2018). Manipulating and measuring model interpretability. arXiv preprint arXiv:1802.07810. (Video)

Further Readings

Walmsley, Joel. "Artificial intelligence and the value of transparency." AI & SOCIETY (2020): 1-11. https://link.springer.com/article/10.1007/s00146-020-01066-z

06 | 07.12.2020 - Post-hoc Interpretability - Limiting Interpretability by Focussing on Experts

Ressources

Overviews on Interpretability/Explanations

Miller, Tim. "Explanation in artificial intelligence: Insights from the social sciences." Artificial Intelligence 267 (2019): 1-38.

Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., & Kagal, L. (2018, October). Explaining explanations: An overview of interpretability of machine learning. In 2018 IEEE 5th International Conference on data science and advanced analytics (DSAA) (pp. 80-89). IEEE.

Arrieta, A. B., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., ... & Chatila, R. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58, 82-115. (check out Figure 6 in this paper)

Further Ressources

Bias-variance tradeoff https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff

Anderson, Carl. The role of model interpretability in data science. Medium, 2016.

Explanantion Methods

LIME | Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD. 1135–1144. (github)

SHAP | Lundberg, Scott M., and Su-In Lee. "A unified approach to interpreting model predictions." Advances in neural information processing systems. 2017. (github)

Rich Caruana, Harsha Nori, Samuel Jenkins, Paul Koch, Ester de Nicolas. 2019. InterpretML software toolkit (github repo, blog post)

07 | 14.12.2020 - Computer Science and Social Science Perspective on Explanation Interfaces

Ressources

Designing Human-Centered Explanantions

Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1–14. DOI:https://doi.org/10.1145/3313831.3376219

Q. Vera Liao, Daniel Gruen, and Sarah Miller. 2020. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI '20). Association for Computing Machinery, New York, NY, USA, 1–15. DOI:https://doi.org/10.1145/3313831.3376590

Further Reading

Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems: An HCI Research Agenda. In <i>Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). Association for Computing Machinery, New York, NY, USA, Paper 582, 1–18. DOI:https://doi.org/10.1145/3173574.3174156

Kacper Sokol and Peter Flach. 2020. Explainability fact sheets: a framework for systematic assessment of explainable approaches. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (FAT* '20). Association for Computing Machinery, New York, NY, USA, 56–67. DOI:https://doi.org/10.1145/3351095.3372870

Further Ressources

Python library Alibi https://docs.seldon.io/projects/alibi/en/stable/index.html

DALEX R package https://uc-r.github.io/dalex

21.12.2020 - No Class 04.01.2021 - No Class 08 | 11.01.2021 - Enhancing Interpretability through Visual Analytics

Ressources

Baumer, E. P. S. (2017). Toward human-centered algorithm design. Big Data & Society.

Chatzimparmpas, A., Martins, R. M., Jusufi, I., Kucher, K., Rossi, F., & Kerren, A. (2020). The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations. In Computer graphics forum.
Overview available at https://trustmlvis.lnu.se/

Tool Videos

Model Tracker

Manifold

What-if Tool

Further interesting approaches

MLJar. https://mljar.com/.

Cloud AutoML. https://cloud.google.com/automl.

CNN Explainer: https://github.com/DLReseach/cnn-explainer

18.01.2021 - No Class 09 | 25.01.2021 - Künstliche Intelligenz und die Mensch-Maschine-Interaktion aus ethischer Sicht Ein utopischer (und dystopischer) Blick in die Zukunft der Patientenversorgung (Guest Lecture by Jun.-Prof. Dr. Susanne Michl)

More information in our news.

Ressources

Further Reading

Ben Shneiderman. 2020. Bridging the Gap Between Ethics and Practice: Guidelines for Reliable, Safe, and Trustworthy Human-centered AI Systems. ACM Trans. Interact. Intell. Syst. 10, 4, Article 26 (December 2020), 31 pages. DOI:https://doi.org/10.1145/3419764

Shahriari, K., & Shahriari, M. (2017). IEEE standard review - Ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems. Institute of Electrical and Electronics Engineers

Barocas, S., & Boyd, D. (2017). Engaging the ethics of data science in practice. Communications of the ACM, 60(11), 23–25. https://doi.org/10.1145/3144172

10 | 01.02.2021 - Privacy Preserving Machine Learning Threats And Solutions (Guest Lecture by Franziska Boenisch)

More information in our news.

Ressources

Further Reading

Javier Salido. 2012. Differential Privacy for Everyone. Microsoft Corporation Whitepaper.

CACM Staff. 2021. Differential privacy: the pursuit of protections by default. Commun. ACM 64, 2 (February 2021), 36–43. DOI:https://doi.org/10.1145/3434228

Wood, Alexandra, et al. "Differential privacy: A primer for a non-technical audience." Vand. J. Ent. & Tech. L. 21 (2018): 209. https://dash.harvard.edu/bitstream/handle/1/38323292/4_Wood_Final.pdf?sequence=1

Papernot, Nicolas. "A Marauder's map of security and privacy in machine learning: an overview of current and future research directions for making machine learning secure and private." Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security. 2018.

11 | 08.02.2021 - Using Thick Data in Data Science 12 | 15.02.2021 - Algorithmic Accountability - Taking Responsibility 13 | 22.02.2021 - Exam

Department of Mathematics and Computer Science

Human-Centered Computing

Lecture with Exercise: Human-Centered Data Science

(L: 19331101 E: 19331102)

Content

Literature

Schedule