HCDS (Fall 2017)/Schedule



Week 1: September 28
Day 1 plan


 * Assignments due
 * fill out the pre-course survey


 * Agenda


 * Readings assigned
 * Watch: Why Humans Should Care About Data Science (Cecilia Aragon, 2016 HCDE Seminar Series)
 * Read: Aragon, C. et al. (2016). Developing a Research Agenda for Human-Centered Data Science. Human Centered Data Science workshop, CSCW 2016.
 * Read: Provost, Foster, and Tom Fawcett. Data science and its relationship to big data and data-driven decision making. Big Data 1.1 (2013): 51-59.
 * Read: Kling, Rob and Star, Susan Leigh. Human Centered Systems in the Perspective of Organizational and Social Informatics. 1997.


 * Homework assigned
 * Reading reflection


 * Resources
 * Ideo.org The Field Guide to Human-Centered Design. 2015.
 * Faraway, Julian. The Decline and Fall of Statistics. Faraway Statistics, 2015.
 * Press, Gil. Data Science: What's The Half-Life Of A Buzzword?'' Forbes, 2013.
 * Bloor, Robin. A Data Science Rant. Inside Analysis, 2013.
 * Various authors. Position papers from 2016 CSCW Human Centered Data Science Workshop. 2016.

Week 2: October 5
Day 2 plan

Ethical considerations in Data Science: privacy, informed consent and user treatment


 * Assignments due
 * Week 1 reading reflection


 * Agenda


 * Readings assigned
 * Read: Markham, Annette and Buchanan, Elizabeth. Ethical Decision-Making and Internet Researchers. Association for Internet Research, 2012.
 * Read: Barocas, Solan and Nissenbaum, Helen. Big Data's End Run around Anonymity and Consent. In Privacy, Big Data, and the Public Good. 2014. (PDF on Canvas)


 * Homework assigned
 * Reading reflection


 * Resources
 * Wittkower, D.E. Lurkers, creepers, and virtuous interactivity: From property rights to consent and care as a conceptual basis for privacy concerns and information ethics
 * National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report. U.S. Department of Health and Human Services, 1979.
 * Hill, Kashmir. Facebook Manipulated 689,003 Users' Emotions For Science. Forbes, 2014.
 * Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock Experimental evidence of massive-scale emotional contagion through social networks. PNAS 2014 111 (24) 8788-8790; published ahead of print June 2, 2014.
 * Barbaro, Michael and Zeller, Tom. A Face Is Exposed for AOL Searcher No. 4417749. New York Times, 2008.
 * Zetter, Kim. Arvind Narayanan Isn’t Anonymous, and Neither Are You. WIRED, 2012.
 * Gray, Mary. When Science, Customer Service, and Human Subjects Research Collide. Now What? Culture Digitally, 2014.
 * Tene, Omer and Polonetsky, Jules. Privacy in the Age of Big Data. Stanford Law Review, 2012.
 * Dwork, Cynthia. Differential Privacy: A survey of results. Theory and Applications of Models of Computation, 2008.
 * Green, Matthew. What is Differential Privacy? A Few Thoughts on Cryptographic Engineering, 2016.
 * Hsu, Danny. Techniques to Anonymize Human Data. Data Sift, 2015.
 * Metcalf, Jacob. Twelve principles of data ethics. Ethical Resolve, 2016.
 * Poor, Nathaniel and Davidson, Roei. When The Data You Want Comes From Hackers, Or, Looking A Gift Horse In The Mouth. CSCW Human Centered Data Science Workshop, 2016.

Week 3: October 12
Day 3 plan


 * Data provenance, preparation, and reproducibility: data curation, preservation, documentation, and archiving; best practices for open scientific research


 * Assignments due
 * Week 2 reading reflection


 * Agenda


 * Readings assigned
 * Read: Chapter 2 "Assessing Reproducibility" and Chapter 3 "The Basic Reproducible Workflow Template" from The Practice of Reproducible Research University of California Press, 2018.
 * Read: Hickey, Walt. The Dollars and Cents Case Against Hollywood's Exclusion of Women. FiveThirtyEight, 2014. AND Keegan, Brian. The Need for Openness in Data Journalism. 2014.


 * Homework assigned
 * Reading reflection
 * A1: Data curation


 * Examples of well-documented open research projects
 * Keegan, Brian. WeatherCrime. GitHub, 2014.
 * Geiger, Stuart R. and Halfaker, Aaron. Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of "Even Good Bots Fight". GitHub, 2017.
 * Thain, Nithum; Dixon, Lucas; and Wulczyn, Ellery. Wikipedia Talk Labels: Toxicity. Figshare, 2017.
 * Narayan, Sneha et al. Replication Data for: The Wikipedia Adventure: Field Evaluation of an Interactive Tutorial for New Users. Harvard Dataverse, 2017.


 * Examples of not-so-well documented open research projects
 * Eclarke. SWGA paper. GitHub, 2016.
 * David Lefevre. Lefevre and Cox: Delayed instructional feedback may be more effective, but is this contrary to learners’ preferences? Figshare, 2016.
 * Alneberg. CONCOCT Paper Data. GitHub, 2014.


 * Other resources
 * Press, Gil. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 2016.
 * Christensen, Garret. Manual of Best Practices in Transparent Social Science Research. 2016.
 * Hickey, Walt. The Bechdel Test: Checking Our Work. FiveThirtyEight, 2014.
 * Chapman et al. Cross Industry Standard Process for Data Mining. IBM, 2000.

Week 4: October 19
Day 4 plan


 * Study design: understanding your data; framing research questions; planning your study


 * Assignments due
 * Reading reflection
 * A1: Data curation


 * Agenda


 * Readings assigned
 * Shyong (Tony) K. Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R. Musicant, Loren Terveen, and John Riedl. 2011. WP:clubhouse?: an exploration of Wikipedia's gender imbalance. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym '11). ACM, New York, NY, USA, 1-10. DOI=http://dx.doi.org/10.1145/2038558.2038560


 * Homework assigned
 * Reading reflection
 * A2: Bias in data


 * Resources
 * Aschwanden, Christie. Science Isn't Broken FiveThirtyEight, 2015.
 * Halfaker, Aaron et al. The Rise and Decline of an Open Collaboration Community: How Wikipedia's reaction to sudden popularity is causing its decline. American Behavioral Scientist, 2012.
 * Warnke-Wang, Morten. Autoconfirmed article creation trial. Wikimedia, 2017.
 * Wikipedia Or Encyclopædia Britannica: Which Has More Bias?. Forbes, 2015. Based on Greenstein, Shane, and Feng Zhu.Do Experts or Collective Intelligence Write with More Bias? Evidence from Encyclopædia Britannica and Wikipedia. Harvard Business School working paper.

Week 5: October 26
Day 5 plan


 * Machine learning: ethical AI, algorithmic transparency, societal implications of machine learning


 * Assignments due
 * Reading reflection


 * Agenda


 * Readings assigned
 * Christian Sandvig, Kevin Hamilton, Karrie Karahalios, Cedric Langbort (2014/05/22) Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Paper presented to "Data and Discrimination: Converting Critical Concerns into Productive Inquiry," a preconference at the 64th Annual Meeting of the International Communication Association. May 22, 2014; Seattle, WA, USA.


 * Homework assigned
 * Reading reflection
 * A3: Final project plan


 * Resources
 * Bamman, David Interpretability in Human-Centered Data Science. 2016 CSCW workshop on Human-Centered Data Science.
 * Anderson, Carl. The role of model interpretability in data science. Medium, 2016.
 * Hill, Kashmir. Facebook figured out my family secrets, and it won't tell me how. Engadget, 2017.
 * Blue, Violet. Google’s comment-ranking system will be a hit with the alt-right. Engadget, 2017.
 * Ingold, David and Soper, Spencer. Amazon Doesn’t Consider the Race of Its Customers. Should It?. Bloomberg, 2016.
 * Whitman, Brian. How music recommendation works - and doesn't work. Variogram, 2012.
 * Lamere, Paul. How good is Google's Instant Mix?. Music Machinery, 2011.
 * Mars, Roman. The Age of the Algorithm. 99% Invisible Podcast, 2017.
 * Google's Perspective API

Week 6: November 2
Day 6 plan


 * Mixed-methods research: Big data vs thick data; qualitative research in data science 


 * Assignments due
 * Reading reflection
 * A2: Bias in data


 * Agenda


 * Readings assigned


 * Homework assigned
 * Reading reflection


 * Resources

Week 7: November 9
Day 7 plan


 * Human computation: ethics of crowdwork, crowdsourcing methodologies for analysis, design, and evaluation


 * Assignments due
 * Reading reflection
 * A3: Final project plan


 * Agenda


 * Readings assigned


 * Homework assigned
 * Reading reflection
 * A4: Crowdwork self-ethnography


 * Resources
 * go here

Week 8: November 16
Day 8 plan


 * User experience and big data: prototyping and user testing; benchmarking and iterative evaluation; UI design for data science


 * Assignments due
 * Reading reflection


 * Agenda


 * Readings assigned


 * Homework assigned
 * Reading reflection

Snyder, Jaime. Values in the Design of Visualizations. 2016 CSCW workshop on Human-Centered Data Science.
 * Resources

Week 9: November 23
Day 9 plan


 * Human-centered data science in the wild: community data science; data science for social good


 * Assignments due
 * Reading reflection
 * A4: Crowdwork self-ethnography


 * Agenda


 * Readings assigned


 * Homework assigned
 * Reading reflection


 * Resources

Week 10: November 30
Day 10 plan


 * Communicating methods, results, and implications: translating for non-data scientists ''


 * Assignments due
 * Reading reflection


 * Agenda


 * Readings assigned


 * Homework assigned
 * Reading reflection
 * A5: Final presentation


 * Resources
 * one

Week 11: December 7
Day 11 plan


 * Future of human centered data science: case studies from research, industry, and policy; final presentations


 * Assignments due
 * Reading reflection
 * A5: Final presentation


 * Agenda


 * Readings assigned
 * none!


 * Homework assigned
 * none!


 * Resources
 * one

Week 12: Finals Week

 * NO CLASS
 * A6: FINAL PROJECT REPORT DUE BY 11:59PM on Sunday, December 10
 * LATE PROJECT SUBMISSIONS NOT ACCEPTED.