This page is a work in progress.
Week 1: September 28
- Assignments due
- fill out the pre-course survey
- Agenda
- Course overview & orientation
- What do we mean by "data science?"
- What do we mean by "human centered?"
- How does human centered design relate to data science?
- Readings assigned
- Watch: Why Humans Should Care About Data Science (Cecilia Aragon, 2016 HCDE Seminar Series)
- Read: Aragon, C. et al. (2016). Developing a Research Agenda for Human-Centered Data Science. Human Centered Data Science workshop, CSCW 2016.
- Read: Provost, Foster, and Tom Fawcett. Data science and its relationship to big data and data-driven decision making. Big Data 1.1 (2013): 51-59.
- Read: Kling, Rob and Star, Susan Leigh. Human Centered Systems in the Perspective of Organizational and Social Informatics. 1997.
- Homework assigned
- Reading reflection
- Resources
- Ideo.org The Field Guide to Human-Centered Design. 2015.
- Faraway, Julian. The Decline and Fall of Statistics. Faraway Statistics, 2015.
- Press, Gil. Data Science: What's The Half-Life Of A Buzzword? Forbes, 2013.
- Bloor, Robin. A Data Science Rant. Inside Analysis, 2013.
- Various authors. Position papers from 2016 CSCW Human Centered Data Science Workshop. 2016.
Week 2: October 5
Ethical considerations in Data Science: privacy, informed consent and user treatment
- Assignments due
- Week 1 reading reflection
- Agenda
- Informed consent in the age of Data Science
- Privacy
- User expectations
- Inferred information
- Correlation
- Anonymisation strategies
- Readings assigned
- Read: Markham, Annette and Buchanan, Elizabeth. Ethical Decision-Making and Internet Researchers. Association for Internet Research, 2012.
- Read: Barocas, Solan and Nissenbaum, Helen. Big Data's End Run around Anonymity and Consent. In Privacy, Big Data, and the Public Good. 2014. (PDF on Canvas)
- Homework assigned
- Reading reflection
- Resources
- Wittkower, D.E. Lurkers, creepers, and virtuous interactivity: From property rights to consent and care as a conceptual basis for privacy concerns and information ethics
- National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report. U.S. Department of Health and Human Services, 1979.
- Hill, Kashmir. Facebook Manipulated 689,003 Users' Emotions For Science. Forbes, 2014.
- Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock Experimental evidence of massive-scale emotional contagion through social networks. PNAS 2014 111 (24) 8788-8790; published ahead of print June 2, 2014.
- Barbaro, Michael and Zeller, Tom. A Face Is Exposed for AOL Searcher No. 4417749. New York Times, 2008.
- Zetter, Kim. Arvind Narayanan Isn’t Anonymous, and Neither Are You. WIRED, 2012.
- Gray, Mary. When Science, Customer Service, and Human Subjects Research Collide. Now What? Culture Digitally, 2014.
- Tene, Omer and Polonetsky, Jules. Privacy in the Age of Big Data. Stanford Law Review, 2012.
- Dwork, Cynthia. Differential Privacy: A survey of results. Theory and Applications of Models of Computation , 2008.
- Green, Matthew. What is Differential Privacy? A Few Thoughts on Cryptographic Engineering, 2016.
- Hsu, Danny. Techniques to Anonymize Human Data. Data Sift, 2015.
- Metcalf, Jacob. Twelve principles of data ethics. Ethical Resolve, 2016.
- Poor, Nathaniel and Davidson, Roei. When The Data You Want Comes From Hackers, Or, Looking A Gift Horse In The Mouth. CSCW Human Centered Data Science Workshop, 2016.
Week 3: October 12
- Data provenance, preparation, and reproducibility
- data curation, preservation, documentation, and archiving; best practices for open scientific research
- Assignments due
- Week 2 reading reflection
- Agenda
- Final project overview
- Introduction to open research
- Understanding data licensing and attribution
- Supporting replicability and reproducibility
- Making your research and data accessible
- Working with Wikipedia datasets
- Assignment 1 description
- Readings assigned
- Read: Chapter 2 "Assessing Reproducibility" and Chapter 3 "The Basic Reproducible Workflow Template" from The Practice of Reproducible Research University of California Press, 2018.
- Read: Hickey, Walt. The Dollars and Cents Case Against Hollywood's Exclusion of Women. FiveThirtyEight, 2014. AND Keegan, Brian. The Need for Openness in Data Journalism. 2014.
- Homework assigned
- Reading reflection
- A1: Data curation
- Examples of well-documented open research projects
- Keegan, Brian. WeatherCrime. GitHub, 2014.
- Geiger, Stuart R. and Halfaker, Aaron. Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of "Even Good Bots Fight". GitHub, 2017.
- Thain, Nithum; Dixon, Lucas; and Wulczyn, Ellery. Wikipedia Talk Labels: Toxicity. Figshare, 2017.
- Narayan, Sneha et al. Replication Data for: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users. Harvard Dataverse, 2017.
- Examples of not-so-well documented open research projects
- Eclarke. SWGA paper. GitHub, 2016.
- David Lefevre. Lefevre and Cox: Delayed instructional feedback may be more effective, but is this contrary to learners’ preferences? Figshare, 2016.
- Alneberg. CONCOCT Paper Data. GitHub, 2014.
- Other resources
- Press, Gil. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 2016.
- Christensen, Garret. Manual of Best Practices in Transparent Social Science Research. 2016.
- Hickey, Walt. The Bechdel Test: Checking Our Work. FiveThirtyEight, 2014.
- Chapman et al. Cross Industry Standard Process for Data Mining. IBM, 2000.
Week 4: October 19
- Study design
- understanding your data; framing research questions; planning your study
- Assignments due
- Reading reflection
- A1: Data curation
- Agenda
- How Wikipedia works (and how it doesn't)
- guest speaker: Morten Warnke-Wang, Wikimedia Foundation
- Sources of bias in data science research
- Sources of bias in Wikipedia data
- Readings assigned
- Shyong (Tony) K. Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R. Musicant, Loren Terveen, and John Riedl. 2011. WP:clubhouse?: an exploration of Wikipedia's gender imbalance. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym '11). ACM, New York, NY, USA, 1-10. DOI=http://dx.doi.org/10.1145/2038558.2038560
- Homework assigned
- Reading reflection
- A2: Bias in data
- Resources
- Aschwanden, Christie. Science Isn't Broken FiveThirtyEight, 2015.
Week 5: October 26
- Machine learning
- ethical AI, algorithmic transparency, societal implications of machine learning
- Assignments due
- Reading reflection
- A2: Bias in data
- Agenda
- Social implications of machine learning
- Consequences of algorithmic bias
- Sources of algorithmic bias
- Addressing algorithmic bias
- Auditing algorithms
- Readings assigned
- Homework assigned
- Reading reflection
- A3: Final project plan
- Resources
Week 6: November 2
- Mixed-methods research
- Big data vs thick data; qualitative research in data science
- Assignments due
- Reading reflection
- Agenda
- Guest speakers: Aaron Halfaker, Caroline Sinders (Wikimedia Foundation)
- Mixed methods research
- Ethnographic methods in data science
- Project plan brainstorm/Q&A session
- Readings assigned
- Homework assigned
- Reading reflection
- Resources
Week 7: November 9
- Human computation
- ethics of crowdwork, crowdsourcing methodologies for analysis, design, and evaluation
- Assignments due
- Reading reflection
- A3: Final project plan
- Agenda
- the role of qualitative research in human centered data science
- scaling qualitative research through crowdsourcing
- types of crowdwork
- ethical and practical considerations for crowdwork
- Introduction to assignment 4: Mechanical Turk ethnography
- Readings assigned
- Homework assigned
- Reading reflection
- A4: Crowdwork self-ethnography
- Resources
- go here
Week 8: November 16
- User experience and big data
- prototyping and user testing; benchmarking and iterative evaluation; UI design for data science
- Assignments due
- Reading reflection
- Agenda
- HCD process in the design of data-driven applications
- understanding user needs, user intent, and context of use in recommender system design
- trust, empowerment, and seamful design
- HCD in data analysis and visualization
- final project lightning feedback sessions
- Readings assigned
- Homework assigned
- Reading reflection
- Resources
Week 9: November 23
- Human-centered data science in the wild
- community data science; data science for social good
- Assignments due
- Reading reflection
- A4: Crowdwork self-ethnography
- Agenda
- NO CLASS - work on your own
- Readings assigned
- Homework assigned
- Reading reflection
- Resources
Week 10: November 30
- Communicating methods, results, and implications
- translating for non-data scientists
- Assignments due
- Reading reflection
- Agenda
- communicating about your research effectively and honestly to different audiences
- publishing your research openly
- disseminating your research
- final project workshop
- Readings assigned
- Homework assigned
- Reading reflection
- A5: Final presentation
- Resources
- one
Week 11: December 7
- Future of human centered data science
- case studies from research, industry, and policy; final presentations
- Assignments due
- Reading reflection
- A5: Final presentation
- Agenda
- future directions of of human centered data science
- final presentations
- Readings assigned
- none!
- Homework assigned
- none!
- Resources
- one
Week 12: Finals Week
- NO CLASS
- A6: FINAL PROJECT REPORT DUE BY 11:59PM on Sunday, December 10
- LATE PROJECT SUBMISSIONS NOT ACCEPTED.