Human Centered Data Science (Fall 2019)/Schedule: Difference between revisions

Revision as of 01:03, 10 October 2019

This page is a work in progress.

Week 1: September 26

Introduction to Human Centered Data Science: What is data science? What is human centered? What is human centered data science?

Assignments due

Fill out the pre-course survey
Read (not graded): Provost, Foster, and Tom Fawcett. Data science and its relationship to big data and data-driven decision making. Big Data 1.1 (2013): 51-59.

Agenda

Syllabus review
Pre-course survey results
What do we mean by data science?
What do we mean by human centered?
How does human centered design relate to data science?
In-class activity
Intro to assignment 1: Data Curation

Homework assigned

Read and reflect on both:

Hickey, Walt. The Dollars and Cents Case Against Hollywood's Exclusion of Women. FiveThirtyEight, 2014.
Keegan, Brian. The Need for Openness in Data Journalism. 2014.

A1: Data curation

Resources

Princeton Dialogues on AI & Ethics: Case studies
Aragon, C. et al. (2016). Developing a Research Agenda for Human-Centered Data Science. Human Centered Data Science workshop, CSCW 2016.
Kling, Rob and Star, Susan Leigh. Human Centered Systems in the Perspective of Organizational and Social Informatics. 1997.
Harford, T. (2014). Big data: A big mistake? Significance, 11(5), 14–19.
Ideo.org The Field Guide to Human-Centered Design. 2015.

Week 2: October 3

Reproducibility and Accountability: data curation, preservation, documentation, and archiving; best practices for open scientific research

Assignments due

Week 1 reading reflection
A1: Data curation

Agenda

Reading reflection discussion
Assignment 1 review & reflection
A primer on copyright, licensing, and hosting for code and data
Introduction to replicability, reproducibility, and open research
In-class activity
Intro to assignment 2: Bias in data

Homework assigned

Read and reflect: Duarte, N., Llanso, E., & Loup, A. (2018). Mixed Messages? The Limits of Automated Social Media Content Analysis. Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 81, 106.
A2: Bias in data

Resources

Hickey, Walt. The Bechdel Test: Checking Our Work. FiveThirtyEight, 2014.
GroupLens, MovieLens datasets
J. Priem, D. Taraborelli, P. Groth, C. Neylon (2010), Altmetrics: A manifesto, 26 October 2010.
Chapter 2 "Assessing Reproducibility" and Chapter 3 "The Basic Reproducible Workflow Template" from The Practice of Reproducible Research University of California Press, 2018.
Halfaker, A., Geiger, R. S., Morgan, J. T., & Riedl, J. (2013). The rise and decline of an open collaboration system: How Wikipedia’s reaction to popularity is causing its decline. American Behavioral Scientist, 57(5), 664-688
TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18). https://doi.org/10.1145/3173574.3173929
Press, Gil. Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says. Forbes, 2016.
Christensen, Garret. Manual of Best Practices in Transparent Social Science Research. 2016.

Week 3: October 10

Interrogating datasets: causes and consequences of bias in data; best practices for selecting, describing, and implementing training data

Assignments due

Week 2 reading reflection

Agenda

Reading reflection review
Sources and consequences of bias in data collection, processing, and re-use
In-class activity

Homework assigned

Read both, reflect on one:

Wang, Tricia. Why Big Data Needs Thick Data. Ethnography Matters, 2016.
Kery, M. B., Radensky, M., Arya, M., John, B. E., & Myers, B. A. (2018). The Story in the Notebook: Exploratory Data Science using a Literate Programming Tool. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems - CHI’18, 1–11. https://doi.org/10.1145/3173574.3173748

Resources

Bender, E. M., & Friedman, B. (2018). Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science. To appear in Transactions of the ACL.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint arXiv:1803.09010.
Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E., & Kiciman, E. (2019). Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries. Frontiers in Big Data, 2, 13. https://doi.org/10.3389/fdata.2019.00013
Rose Eveleth The biggest lie tech people tell themselves — and the rest of us. October 8, 2019, Vox.com.
Rani Molla The government is using the wrong data to make crucial decisions about the internet. February 8, 2019, Vox.com.

Week 4: October 17

Introduction to mixed-methods research: Big data vs thick data; integrating qualitative research methods into data science practice; crowdsourcing

Assignments due

Reading reflection
A2: Bias in data

Agenda

Reading reflection review
Review of assignment 2
Survey of qualitative research methods
Mixed-methods case study
Introduction to ethnography
Ethnographic research case study
In-class activity
Introduction to crowdwork
Overview of Assignment 3: Crowdwork ethnography

Homework assigned

Read and reflect: Barocas, Solan and Nissenbaum, Helen. Big Data's End Run around Anonymity and Consent. In Privacy, Big Data, and the Public Good. 2014. (PDF available on Canvas)
A3: Crowdwork ethnography

Qualitative and mixed-methods research resources

Ford, D., Smith, J., Guo, P. J., & Parnin, C. (2016). Paradise unplugged: Identifying barriers for female participation on stack overflow. Proceedings of the ACM SIGSOFT Symposium on the Foundations of Software Engineering, 13-18-Nove, 846–857. https://doi.org/10.1145/2950290.2950331
Ladner, S. (2016). Practical ethnography: A guide to doing ethnography in the private sector. Routledge.
Spradley, J. P. (2016). The ethnographic interview. Waveland Press.
Spradley, J. P. (2016) Participant Observation. Waveland Press
Eriksson, P., & Kovalainen, A. (2015). Ch 12: Ethnographic Research. In Qualitative methods in business research: A practical guide to social research. Sage.
Usability.gov, System usability scale.
Nielsen, Jakob (2000). Why you only need to test with five users. nngroup.com.

Crowdwork research resources

WeArDynamo contributors. How to be a good requester and Guidelines for Academic Requesters. Wearedynamo.org

Week 5: October 24

Research ethics for big data: privacy, informed consent and user treatment

Assignments due

Reading reflection

Agenda

Reading reflection review
A brief history of research ethics in the United States
Research ethics in data science
Technological approaches to data privacy
Guest lecture
Procedural approaches to data privacy

Homework assigned

Read and reflect: Gray, M. L., & Suri, S. (2019). Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass. Eamon Dolan Books. (PDF available on Canvas)

Resources

National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The Belmont Report. U.S. Department of Health and Human Services, 1979.
Bethan Cantrell, Javier Salido, and Mark Van Hollebeke (2016). Industry needs to embrace data ethics: Here's how it could be done. Workshop on Data and Algorithmic Transparency (DAT'16). http://datworkshop.org/
Javier Salido (2012). Differential Privacy for Everyone. Microsoft Corporation Whitepaper.
Markham, Annette and Buchanan, Elizabeth. Ethical Decision-Making and Internet Researchers. Association for Internet Research, 2012.
Hill, Kashmir. Facebook Manipulated 689,003 Users' Emotions For Science. Forbes, 2014.
Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock Experimental evidence of massive-scale emotional contagion through social networks. PNAS 2014 111 (24) 8788-8790; published ahead of print June 2, 2014.
Barbaro, Michael and Zeller, Tom. A Face Is Exposed for AOL Searcher No. 4417749. New York Times, 2008.
Zetter, Kim. Arvind Narayanan Isn’t Anonymous, and Neither Are You. WIRED, 2012.
Gray, Mary. When Science, Customer Service, and Human Subjects Research Collide. Now What? Culture Digitally, 2014.
Tene, Omer and Polonetsky, Jules. Privacy in the Age of Big Data. Stanford Law Review, 2012.
Dwork, Cynthia. Differential Privacy: A survey of results. Theory and Applications of Models of Computation , 2008.
Hsu, Danny. Techniques to Anonymize Human Data. Data Sift, 2015.

Week 6: October 31

Data science and society: power, data, and society; ethics of crowdwork

Assignments due

Reading reflection
A3: Crowdwork ethnography

Agenda

Reading reflections
Assignment 3 review
In-class activity
Introduction to assignment 4: Final project proposal

Homework assigned

Read both, reflect on one:

Baumer, E. P. S. (2017). Toward human-centered algorithm design. Big Data & Society.
Amershi, S., Cakmak, M., Knox, W. B., & Kulesza, T. (2014). Power to the People: The Role of Humans in Interactive Machine Learning. AI Magazine, 35(4), 105.

A4: Final project proposal

Resources

Lilly C. Irani and M. Six Silberman. 2013. Turkopticon: interrupting worker invisibility in amazon mechanical turk. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). DOI: https://doi.org/10.1145/2470654.2470742
Ingold, David and Soper, Spencer. Amazon Doesn’t Consider the Race of Its Customers. Should It?. Bloomberg, 2016.
Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. Machine Bias: Risk Assessment in Criminal Sentencing. Propublica, May 2018.

Week 7: November 7

Human centered machine learning: algorithmic fairness, transparency, and accountability; methods and contexts for algorithmic audits

Assignments due

Reading reflection
A4: Project proposal

Agenda

Reading reflection review
Algorithmic transparency, interpretability, and accountability
Auditing algorithms
In-class activity
Introduction to assignment 5: Final project proposal

Homework assigned

Read and reflect: Kocielnik, R., Amershi, S., & Bennett, P. N. (2019). Will You Accept an Imperfect AI? Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems - CHI ’19, 1–14. https://doi.org/10.1145/3290605.3300641
A5: Final project plan

Resources

Christian Sandvig, Kevin Hamilton, Karrie Karahalios, Cedric Langbort (2014/05/22) Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms. Paper presented to "Data and Discrimination: Converting Critical Concerns into Productive Inquiry," a preconference at the 64th Annual Meeting of the International Communication Association. May 22, 2014; Seattle, WA, USA.
Shahriari, K., & Shahriari, M. (2017). IEEE standard review - Ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems. Institute of Electrical and Electronics Engineers
ACM US Policy Council Statement on Algorithmic Transparency and Accountability. January 2017.
Asilomar AI Principles. Future of Life Institute, 2017.
Diakopoulos, N., Friedler, S., Arenas, M., Barocas, S., Hay, M., Howe, B., … Zevenbergen, B. (2018). Principles for Accountable Algorithms and a Social Impact Statement for Algorithms. Fatml.Org 2018.
Friedman, B., & Nissenbaum, H. (1996). Bias in Computer Systems. ACM Trans. Inf. Syst., 14(3), 330–347.
Nate Matias, 2017. How Anyone Can Audit Facebook's Newsfeed. Medium.com
Hill, Kashmir. Facebook figured out my family secrets, and it won't tell me how. Engadget, 2017.
Blue, Violet. Google’s comment-ranking system will be a hit with the alt-right. Engadget, 2017.
Google's Perspective API
Morgan, J. 2016. Evaluating Related Articles recommendations. Wikimedia Research.
Morgan, J. 2017. Comparing most read and trending edits for the top articles feature. Wikimedia Research.
Michael D. Ekstrand, F. Maxwell Harper, Martijn C. Willemsen, and Joseph A. Konstan. 2014. User perception of differences in recommender algorithms. In Proceedings of the 8th ACM Conference on Recommender systems (RecSys '14).

Week 8: November 14

User experience and data science: algorithmic interpretibility; human-centered methods for designing and evaluating algorithmic systems

Assignments due

Reading reflection
A5: Final project plan

Agenda

coming soon

Homework assigned

Reading and reflect: Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé, III, Miro Dudik, and Hanna Wallach. 2019. Improving Fairness in Machine Learning Systems: What Do Industry Practitioners Need?. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI '19). ACM, New York, NY, USA, Paper 600, 16 pages. DOI: https://doi.org/10.1145/3290605.3300830
A6: Final project presentation

Resources

Ethical OS Toolkit and Risk Mitigation Checklist. EthicalOS.org.
Sean M. McNee, John Riedl, and Joseph A. Konstan. 2006. Making recommendations better: an analytic model for human-recommender interaction. In CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06).
Sean M. McNee, Nishikant Kapoor, and Joseph A. Konstan. 2006. Don't look stupid: avoiding pitfalls when recommending research papers. In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (CSCW '06).
Michael D. Ekstrand and Martijn C. Willemsen. 2016. Behaviorism is Not Enough: Better Recommendations through Listening to Users. In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16).
Jess Holbrook. Human Centered Machine Learning. Google Design Blog. 2017.
Anderson, Carl. The role of model interpretability in data science. Medium, 2016.
Fabien Girardin. Experience design in the machine learning era. Medium, 2016.
Xavier Amatriain and Justin Basilico. Netflix Recommendations: Beyond the 5 stars. Netflix Tech Blog, 2012.
Jess Holbrook. Human Centered Machine Learning. Google Design Blog. 2017.
Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction 22, 4-5 (October 2012), 441-504. DOI=http://dx.doi.org/10.1007/s11257-011-9118-4
Patrick Austin, Facebook, Google, and Microsoft Use Design to Trick You Into Handing Over Your Data, New Report Warns. Gizmodo, 6/18/2018
Cremonesi, P., Elahi, M., & Garzotto, F. (2017). User interface patterns in recommendation-empowered content intensive multimedia applications. Multimedia Tools and Applications, 76(4), 5275-5309.

Week 9: November 21

Data science in context: Doing human centered datascience in product organizations; communicating and collaborating across roles and disciplines; HCDS industry trends and trajectories

Assignments due

Reading reflection

Agenda

coming soon

Homework assigned

Read and reflect: Alkhatib, A., & Bernstein, M. (2019). Street-Level Algorithms: A Theory at the Gaps Between Policy and Decisions. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. https://doi.org/10.1145/3290605.3300760
A7: Final project report

Resources

Week 10: November 28 (No Class Session)

Assignments due

Reading reflection

Homework assigned

Read and reflect: Barocas, S., & Boyd, D. (2017). Engaging the ethics of data science in practice. Communications of the ACM, 60(11), 23–25. https://doi.org/10.1145/3144172 (PDF available on Canvas)

Resources

Marilynn Larkin, How to give a dynamic scientific presentation. Elsevier Connect, 2015.
Megan Risdal, Communicating data science: a guide to presenting your work. Kaggle blog, 2016.
Megan Risdal, Communicating data science: Why and how to visualize information. Kaggle blog, 2016.
Megan Risdal, Communicating data science: an interview with a storytelling expert. Kaggle blog, 2016.
Brent Dykes, Data Storytelling: The Essential Data Science Skill Everyone Needs. Forbes, 2016.

Week 11: December 5

Final presentations: presentation of student projects, course wrap up

Assignments due

Reading reflection
A5: Final presentation

Readings assigned

NONE

Homework assigned

NONE

Resources

NONE

Week 12: Finals Week (No Class Session)

NO CLASS
A7: FINAL PROJECT REPORT DUE BY 5:00PM on Tuesday, December 10 via Canvas
LATE PROJECT SUBMISSIONS NOT ACCEPTED.

@@ Line 120: / Line 120: @@
 * Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumeé III, H., & Crawford, K. (2018). [https://www.fatml.org/media/documents/datasheets_for_datasets.pdf Datasheets for datasets]. arXiv preprint arXiv:1803.09010.
 * Olteanu, A., Castillo, C., Diaz, F., Kıcıman, E., & Kiciman, E. (2019). ''[https://www.frontiersin.org/articles/10.3389/fdata.2019.00013/pdf Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries].'' Frontiers in Big Data, 2, 13. https://doi.org/10.3389/fdata.2019.00013
+* Rose Eveleth ''[https://www.vox.com/the-highlight/2019/10/1/20887003/tech-technology-evolution-natural-inevitable-ethics The biggest lie tech people tell themselves — and the rest of us].'' October 8, 2019, Vox.com.
+* Rani Molla ''[https://www.vox.com/2019/2/8/18211794/government-data-internet The government is using the wrong data to make crucial decisions about the internet].'' February 8, 2019, Vox.com.
 <br/>
 <hr/>