Human Centered Data Science (Fall 2019)/Schedule
From CommunityData
This page is a work in progress.
Week 1: September 26
- Introduction to Human Centered Data Science
- What is data science? What is human centered? What is human centered data science?
- Assignments due
- Fill out the pre-course survey
- Read (not graded): Provost, Foster, and Tom Fawcett. Data science and its relationship to big data and data-driven decision making. Big Data 1.1 (2013): 51-59.
- Agenda
- Syllabus review
- Pre-course survey results
- What do we mean by data science?
- What do we mean by human centered?
- How does human centered design relate to data science?
- In-class activity
- Intro to assignment 1: Data Curation
- Homework assigned
- Read and reflect on both:
- Hickey, Walt. The Dollars and Cents Case Against Hollywood's Exclusion of Women. FiveThirtyEight, 2014.
- Keegan, Brian. The Need for Openness in Data Journalism. 2014.
- Resources
- Princeton Dialogues on AI & Ethics: Case studies
- Aragon, C. et al. (2016). Developing a Research Agenda for Human-Centered Data Science. Human Centered Data Science workshop, CSCW 2016.
- Kling, Rob and Star, Susan Leigh. Human Centered Systems in the Perspective of Organizational and Social Informatics. 1997.
- Harford, T. (2014). Big data: A big mistake? Significance, 11(5), 14–19.
- Ideo.org The Field Guide to Human-Centered Design. 2015.
Week 2: October 3
- Reproducibility and Accountability
- data curation, preservation, documentation, and archiving; best practices for open scientific research
- Assignments due
- Week 1 reading reflection
- A1: Data curation
- Agenda
- Reading reflection discussion
- Assignment 1 review & reflection
- A primer on copyright, licensing, and hosting for code and data
- Introduction to replicability, reproducibility, and open research
- In-class activity
- Intro to assignment 2: Bias in data
- Homework assigned
- Read and reflect: Duarte, N., Llanso, E., & Loup, A. (2018). Mixed Messages? The Limits of Automated Social Media Content Analysis. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 81, 106.
- A2: Bias in data
- Resources
- Hickey, Walt. The Bechdel Test: Checking Our Work. FiveThirtyEight, 2014.
- GroupLens, MovieLens datasets
- J. Priem, D. Taraborelli, P. Groth, C. Neylon (2010), Altmetrics: A manifesto, 26 October 2010.
- Chapter 2 "Assessing Reproducibility" and Chapter 3 "The Basic Reproducible Workflow Template" from The Practice of Reproducible Research University of California Press, 2018.
- Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 57(5), 664-688
- TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18). https://doi.org/10.1145/3173574.3173929
- Press, Gil. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 2016.
- Christensen, Garret. Manual of Best Practices in Transparent Social Science Research. 2016.
Week 3: October 10
- Interrogating datasets
- causes and consequences of bias in data; best practices for selecting, describing, and implementing training data
- Assignments due
- Week 2 reading reflection
- Agenda
- Reading reflection review
- Sources and consequences of bias in data collection, processing, and re-use
- In-class activity
- Homework assigned
- Read both, reflect on one:
- Wang, Tricia. Why Big Data Needs Thick Data. Ethnography Matters, 2016.
- Kery, M. B., Radensky, M., Arya, M., John, B. E., & Myers, B. A. (2018). The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI’18, 1–11. https://doi.org/10.1145/3173574.3173748
- Resources
- Bender, E. M., & Friedman, B. (2018). Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science. To appear in Transactions of the ACL.
- Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
- Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E., & Kiciman, E. (2019). Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data, 2, 13. https://doi.org/10.3389/fdata.2019.00013
- Rose Eveleth The biggest lie tech people tell themselves — and the rest of us. October 8, 2019, Vox.com.
- Rani Molla The government is using the wrong data to make crucial decisions about the internet. February 8, 2019, Vox.com.
- Isaac L. Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at Home on the Range: Peer Production and the Urban/Rural Divide. DOI: https://doi.org/10.1145/2858036.2858123
- Leo Graiden Stewart, Ahmer Arif, A. Conrad Nied, Emma S. Spiro, and Kate Starbird. 2017. Drawing the Lines of Contention: Networked Frame Contests Within #BlackLivesMatter Discourse. CSCW 2017. DOI: https://doi.org/10.1145/3134920
- Lada A. Adamic and Natalie Glance. 2005. The political blogosphere and the 2004 U.S. election: divided they blog. (LinkKDD '05). DOI=http://dx.doi.org/10.1145/1134271.1134277
- Cristian Danescu-Niculescu-Mizil, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. No country for old members: user lifecycle and linguistic change in online communities. (WWW '13). DOI: https://doi.org/10.1145/2488388.2488416
Week 4: October 17
- Introduction to qualitative and mixed-methods research
- Big data vs thick data; integrating qualitative research methods into data science practice; crowdsourcing
- Assignments due
- Reading reflection
- A2: Bias in data
- Agenda
- Reading reflection reflection
- Overview of qualitative research
- Introduction to ethnography
- In-class activity: explaining art to aliens
- Mixed methods research and data science
- An introduction to crowdwork
- Overview of assignment 3: Crowdwork ethnography
- Homework assigned
- Read and reflect: Barocas, Solan and Nissenbaum, Helen. Big Data's End Run around Anonymity and Consent. In Privacy, Big Data, and the Public Good. 2014. (PDF available on Canvas)
- A3: Crowdwork ethnography
- Resources
- Singer, P., Lemmerich, F., West, R., Zia, L., Wulczyn, E., Strohmaier, M., & Leskovec, J. (2017, April). Why we read wikipedia. In Proceedings of the 26th International Conference on World Wide Web.
- Taxonomy of reasons why people trust/distrust Wikipedia, Jonathan Morgan, Wikimedia Research report, May 2019.
- Ladner, S. (2016). Practical ethnography: A guide to doing ethnography in the private sector. Routledge.
- Spradley, J. P. (2016). The ethnographic interview. Waveland Press.
- Spradley, J. P. (2016) Participant Observation. Waveland Press
- Eriksson, P., & Kovalainen, A. (2015). Ch 12: Ethnographic Research. In Qualitative methods in business research: A practical guide to social research. Sage.
- Qualitative research activity: categorizing student responses. Mark Girod, Western Oregon University
- Empirical Epistemologies Applied to Human-‐Centered Computing Research Leysia Palen, University of Colorado Boulder, November 16 2014.
Week 5: October 24
- Research ethics for big data
- privacy, informed consent and user treatment
- Assignments due
- Reading reflection
- Agenda
- Reading reflection review
- Guest lecture
- A2 retrospective
- Final project deliverables and timeline
- A brief history of research ethics in the United States
- Homework assigned
- Read and reflect: Gray, M. L., & Suri, S. (2019). Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Eamon Dolan Books. (PDF available on Canvas)
- Resources
- National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report. U.S. Department of Health and Human Services, 1979.
- Bethan Cantrell, Javier Salido, and Mark Van Hollebeke (2016). Industry needs to embrace data ethics: Here's how it could be done. Workshop on Data and Algorithmic Transparency (DAT'16). http://datworkshop.org/
- Javier Salido (2012). Differential Privacy for Everyone. Microsoft Corporation Whitepaper.
- Markham, Annette and Buchanan, Elizabeth. Ethical Decision-Making and Internet Researchers. Association for Internet Research, 2012.
- Kelley, P. G., Bresee, J., Cranor, L. F., & Reeder, R. W. (2009). A “nutrition label” for privacy. Proceedings of the 5th Symposium on Usable Privacy and Security - SOUPS ’09, 1990, 1. https://doi.org/10.1145/1572532.1572538
- Warncke-Wang, M., Cosley, D., & Riedl, J. (2013). Tell me more: An actionable quality model for wikipedia. Proceedings of the 9th International Symposium on Open Collaboration, WikiSym + OpenSym 2013. https://doi.org/10.1145/2491055.2491063
Week 6: October 31
- Data science and society
- power, data, and society; ethics of crowdwork
- Assignments due
- Reading reflection
- A3: Crowdwork ethnography
- Agenda
- Reading reflections
- Assignment 3 review
- Guest lecture: Stefania Druga
- In-class activity
- Introduction to assignment 4: Final project proposal
- Homework assigned
- Read both, reflect on one:
- Baumer, E. P. S. (2017). Toward human-centered algorithm design. Big Data & Society.
- Amershi, S., Cakmak, M., Knox, W. B., & Kulesza, T. (2014). Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine, 35(4), 105.
- Resources
- Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). DOI: https://doi.org/10.1145/2470654.2470742
- Salehi, Niloufar, Lilly C. Irani, Michael S. Bernstein, Ali Alkhatib, Eva Ogbe, and Kristy Milland. We are dynamo: Overcoming stalling and friction in collective action for crowd workers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pp. 1621-1630. ACM, 2015.
- Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). Democratizing Data Science: The Community Data Science Workshops and Classes. In N. Jullien, S. A. Matei, & S. P. Goggins (Eds.), Big Data Factories: Scientific Collaborative approaches for virtual community data collection, repurposing, recombining, and dissemination. New York, New York: Springer Nature. https://doi.org/10.1007/978-3-319-59186-5_9
- Ingold, David and Soper, Spencer. Amazon Doesn’t Consider the Race of Its Customers. Should It?. Bloomberg, 2016.
- Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. Machine Bias: Risk Assessment in Criminal Sentencing. Propublica, May 2018.
Week 7: November 7
- Human centered machine learning
- algorithmic fairness, transparency, and accountability; methods and contexts for algorithmic audits
- Assignments due
- Reading reflection
- A4: Project proposal
- Agenda
- Reading reflection review
- Algorithmic transparency, interpretability, and accountability
- Auditing algorithms
- In-class activity
- Introduction to assignment 5: Final project proposal
- Homework assigned
- Read and reflect: Kocielnik, R., Amershi, S., & Bennett, P. N. (2019). Will You Accept an Imperfect AI? Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19, 1–14. https://doi.org/10.1145/3290605.3300641
- A5: Final project plan
- Resources
- Christian Sandvig, Kevin Hamilton, Karrie Karahalios, Cedric Langbort (2014/05/22) Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Paper presented to "Data and Discrimination: Converting Critical Concerns into Productive Inquiry," a preconference at the 64th Annual Meeting of the International Communication Association. May 22, 2014; Seattle, WA, USA.
- Friedman, B., & Nissenbaum, H. (1996). Bias in Computer Systems. ACM Trans. Inf. Syst., 14(3), 330–347.
- Nate Matias, 2017. How Anyone Can Audit Facebook's Newsfeed. Medium.com
- Hill, Kashmir. Facebook figured out my family secrets, and it won't tell me how. Engadget, 2017.
- Blue, Violet. Google’s comment-ranking system will be a hit with the alt-right. Engadget, 2017.
- Anderson, Carl. The role of model interpretability in data science. Medium, 2016.
- Google's Perspective API
Week 8: November 14
- User experience and data science
- algorithmic interpretibility; human-centered methods for designing and evaluating algorithmic systems
- Assignments due
- Reading reflection
- A5: Final project plan
- Agenda
- coming soon
- Homework assigned
- Reading and reflect: Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, III, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Paper 600, 16 pages. DOI: https://doi.org/10.1145/3290605.3300830
- A6: Final project presentation
- Resources
- Sean M. McNee, John Riedl, and Joseph A. Konstan. 2006. Making recommendations better: an analytic model for human-recommender interaction. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06).
- Sean M. McNee, Nishikant Kapoor, and Joseph A. Konstan. 2006. Don't look stupid: avoiding pitfalls when recommending research papers. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (CSCW '06).
- Shahriari, K., & Shahriari, M. (2017). IEEE standard review - Ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems. Institute of Electrical and Electronics Engineers
- ACM US Policy Council Statement on Algorithmic Transparency and Accountability. January 2017.
- Asilomar AI Principles. Future of Life Institute, 2017.
- Diakopoulos, N., Friedler, S., Arenas, M., Barocas, S., Hay, M., Howe, B., … Zevenbergen, B. (2018). Principles for Accountable Algorithms and a Social Impact Statement for Algorithms. Fatml.Org 2018.
- Jess Holbrook. Human Centered Machine Learning. Google Design Blog. 2017.
- Fabien Girardin. Experience design in the machine learning era. Medium, 2016.
- Xavier Amatriain and Justin Basilico. Netflix Recommendations: Beyond the 5 stars. Netflix Tech Blog, 2012.
- Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4-5 (October 2012), 441-504. DOI=http://dx.doi.org/10.1007/s11257-011-9118-4
- Patrick Austin, Facebook, Google, and Microsoft Use Design to Trick You Into Handing Over Your Data, New Report Warns. Gizmodo, 6/18/2018
- Cremonesi, P., Elahi, M., & Garzotto, F. (2017). User interface patterns in recommendation-empowered content intensive multimedia applications. Multimedia Tools and Applications, 76(4), 5275-5309.
Week 9: November 21
- Data science in context
- Doing human centered datascience in product organizations; communicating and collaborating across roles and disciplines; HCDS industry trends and trajectories
- Assignments due
- Reading reflection
- Agenda
- coming soon
- Homework assigned
- Read and reflect: Alkhatib, A., & Bernstein, M. (2019). Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290605.3300760
- A7: Final project report
- Resources
- Ethical OS Toolkit and Risk Mitigation Checklist. EthicalOS.org.
- Morgan, J. T., 2019. Ethical and Human-centered AI at Wikimedia. Wikimedia Research 2030.
Week 10: November 28 (No Class Session)
- Assignments due
- Reading reflection
- Homework assigned
- Read and reflect: Barocas, S., & Boyd, D. (2017). Engaging the ethics of data science in practice. Communications of the ACM, 60(11), 23–25. https://doi.org/10.1145/3144172 (PDF available on Canvas)
- Resources
- Marilynn Larkin, How to give a dynamic scientific presentation. Elsevier Connect, 2015.
- Megan Risdal, Communicating data science: a guide to presenting your work. Kaggle blog, 2016.
- Megan Risdal, Communicating data science: Why and how to visualize information. Kaggle blog, 2016.
- Megan Risdal, Communicating data science: an interview with a storytelling expert. Kaggle blog, 2016.
- Brent Dykes, Data Storytelling: The Essential Data Science Skill Everyone Needs. Forbes, 2016.
Week 11: December 5
- Final presentations
- presentation of student projects, course wrap up
- Assignments due
- Reading reflection
- A5: Final presentation
- Readings assigned
- NONE
- Homework assigned
- NONE
- Resources
- NONE
Week 12: Finals Week (No Class Session)
- NO CLASS
- A7: FINAL PROJECT REPORT DUE BY 5:00PM on Tuesday, December 10 via Canvas
- LATE PROJECT SUBMISSIONS NOT ACCEPTED.