User:Groceryheist/drafts/Data Science Syllabus: Difference between revisions

From CommunityData
No edit summary
No edit summary
Line 15: Line 15:
* Combine quantitative and qualitative data to generate critical insights into human behavior.
* Combine quantitative and qualitative data to generate critical insights into human behavior.
* Discuss and evaluate ethical, social, organizational and legal trade-offs of different data analysis, testing, curation, and sharing methods.
* Discuss and evaluate ethical, social, organizational and legal trade-offs of different data analysis, testing, curation, and sharing methods.
== Schedule ==
<div class="toccolours mw-collapsible">
Course schedule (click to expand)
<div class="mw-collapsible-content">
<noinclude>
<div style="font-family:Rockwell,'Courier Bold',Courier,Georgia,'Times New Roman',Times,serif; min-width:10em;">
<div style="float:left; width:100%; margin-right:2%;">
{{Link/Graphic/Main/2
|highlight color= 27666b
|color=460c40
|link=
|image=
|text-align=left
|top font-size= 1.1em
|top color=FFF
|line color=FFF
|top text=This page is a work in progress.
|bottom font-size= 1em
|bottom color= FFF
|bottom text=
|line= none
}}</div></div>
</noinclude>
=== Week 1:  ===
<!-- [[HCDS_(Fall_2018)/Day_1_plan|Day 1 plan]] -->
<!-- [[:File:HCDS_2018_week_1_slides.pdf|Day 1 slides]] <\!--  -\-> -->
;Introduction to Human Centered Data Science: ''What is data science? What is human centered? What is human centered data science?''
;Assignments due
* Fill out the pre-course survey
* Attend week 1 of CDSW
<!-- ;Agenda -->
<!-- {{:HCDS (Fall 2018)/Day 1 plan}} -->
;Readings assigned
* Read: Provost, Foster, and Tom Fawcett. [http://online.liebertpub.com/doi/pdf/10.1089/big.2013.1508 ''Data science and its relationship to big data and data-driven decision making.''] Big Data 1.1 (2013): 51-59.
* Kling, Rob and Star, Susan Leigh. [https://scholarworks.iu.edu/dspace/bitstream/handle/2022/1798/wp97-04B.html ''Human Centered Systems in the Perspective of Organizational and Social Informatics.''] 1997.
* Read: Barocas, Solan and Nissenbaum, Helen. [https://www.nyu.edu/projects/nissenbaum/papers/BigDatasEndRun.pdf ''Big Data's End Run around Anonymity and Consent'']. In ''Privacy, Big Data, and the Public Good''. 2014.
;Homework assigned
* Reading reflection
<!-- ;Resources -->
<!-- * Aragon, C. et al. (2016). [https://cscw2016hcds.files.wordpress.com/2015/10/cscw_2016_human-centered-data-science_workshop.pdf ''Developing a Research Agenda for Human-Centered Data Science.''] Human Centered Data Science workshop, CSCW 2016. -->
<!-- * Harford, T. (2014). ''[http://doi.org/10.1111/j.1740-9713.2014.00778.x Big data: A big mistake?]'' Significance, 11(5), 14–19. -->
<!-- * Ideo.org [http://www.designkit.org/ ''The Field Guide to Human-Centered Design.''] 2015. -->
<br/>
<hr/>
<br/>
=== Week 2: ===
<!-- [[HCDS_(Fall_2018)/Day_2_plan|Day 2 plan]] -->
<!-- [[:File:HCDS Week 2 slides.pdf|Day 2 slides]] -->
;Ethical considerations: ''privacy, informed consent and user treatment''
;Assignments due
* Week 1 reading reflection
* Attend week 2 of CDSW
<!-- ;Agenda -->
<!-- {{:HCDS (Fall 2018)/Day 2 plan}} -->
;Readings assigned
* Read:  boyd, danah and Crawford, Kate, Six Provocations for Big Data (September 21, 2011). A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, September 2011. Available at SSRN: https://ssrn.com/abstract=1926431 or http://dx.doi.org/10.2139/ssrn.1926431
;Homework assigned
* Reading reflection
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A1:_Data_curation|A1: Data curation]]
;Resources
* Nissenbaum, Helen, [https://crypto.stanford.edu/portia/papers/RevnissenbaumDTP31.pdf Privacy as Contextual Integrity]
* National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. [https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/index.html ''The Belmont Report.''] U.S. Department of Health and Human Services, 1979.
* Bethan Cantrell, Javier Salido, and Mark Van Hollebeke (2016). ''[http://datworkshop.org/papers/dat16-final38.pdf Industry needs to embrace data ethics: Here's how it could be done]''. Workshop on Data and Algorithmic Transparency (DAT'16). http://datworkshop.org/
* Javier Salido (2012). ''[http://download.microsoft.com/download/D/1/F/D1F0DFF5-8BA9-4BDF-8924-7816932F6825/Differential_Privacy_for_Everyone.pdf Differential Privacy for Everyone].'' Microsoft Corporation Whitepaper.
* Markham, Annette and Buchanan, Elizabeth. [https://aoir.org/reports/ethics2.pdf ''Ethical Decision-Making and Internet Researchers.''] Association for Internet Research, 2012.
* Hill, Kashmir. [https://www.forbes.com/sites/kashmirhill/2014/06/28/facebook-manipulated-689003-users-emotions-for-science/#6a01653e197c ''Facebook Manipulated 689,003 Users' Emotions For Science.''] Forbes, 2014.
* Adam D. I. Kramer, Jamie E. Guillory, and Jeffrey T. Hancock [http://www.pnas.org/content/111/24/8788.full ''Experimental evidence of massive-scale emotional contagion through social networks.''] PNAS 2014 111 (24) 8788-8790; published ahead of print June 2, 2014.
* Barbaro, Michael and Zeller, Tom. [http://query.nytimes.com/gst/abstract.html?res=9E0CE3DD1F3FF93AA3575BC0A9609C8B63&legacy=true ''A Face Is Exposed for AOL Searcher No. 4417749.''] New York Times, 2008.
* Zetter, Kim. [https://www.wired.com/2012/06/wmw-arvind-narayanan/ ''Arvind Narayanan Isn’t Anonymous, and Neither Are You.''] WIRED, 2012.
* Gray, Mary. [http://culturedigitally.org/2014/07/when-science-customer-service-and-human-subjects-research-collide-now-what/ ''When Science, Customer Service, and Human Subjects Research Collide. Now What?''] Culture Digitally, 2014.
* Tene, Omer and Polonetsky, Jules. [https://www.stanfordlawreview.org/online/privacy-paradox-privacy-and-big-data/ ''Privacy in the Age of Big Data.''] Stanford Law Review, 2012.
* Dwork, Cynthia. [https://www.microsoft.com/en-us/research/wp-content/uploads/2008/04/dwork_tamc.pdf ''Differential Privacy: A survey of results'']. Theory and Applications of Models of Computation , 2008.
* Hsu, Danny. [http://blog.datasift.com/2015/04/09/techniques-to-anonymize-human-data/ ''Techniques to Anonymize Human Data.''] Data Sift, 2015.
* Metcalf, Jacob. [http://ethicalresolve.com/twelve-principles-of-data-ethics/ ''Twelve principles of data ethics'']. Ethical Resolve, 2016.
<br/>
<hr/>
<br/>
=== Week 3: October 11 ===
[[HCDS_(Fall_2018)/Day_3_plan|Day 3 plan]]
[[:File:HCDS_2018_week_3_slides.pdf|Day 3 slides]]
;Reproducibility and Accountability: ''data curation, preservation, documentation, and archiving; best practices for open scientific research''
;Assignments due
* Week 2 reading reflection
;Agenda
{{:HCDS (Fall 2018)/Day 3 plan}}
;Readings assigned
* Read: Duarte, N., Llanso, E., & Loup, A. (2018). ''[https://cdt.org/files/2017/12/FAT-conference-draft-2018.pdf Mixed Messages? The Limits of Automated Social Media Content Analysis].'' Proceedings of the 1st Conference on Fairness, Accountability and Transparency, 81, 106.
;Homework assigned
* Reading reflection
;Resources
* Hickey, Walt. [https://fivethirtyeight.com/features/the-dollar-and-cents-case-against-hollywoods-exclusion-of-women/ ''The Dollars and Cents Case Against Hollywood's Exclusion of Women.''] FiveThirtyEight, 2014.
* Keegan, Brian. [https://github.com/brianckeegan/Bechdel/blob/master/Bechdel_test.ipynb ''The Need for Openness in Data Journalism.''] 2014.
* Hickey, Walt. [https://fivethirtyeight.com/features/the-bechdel-test-checking-our-work/ ''The Bechdel Test: Checking Our Work'']. FiveThirtyEight, 2014.
* J. Priem, D. Taraborelli, P. Groth, C. Neylon (2010), ''[http://altmetrics.org/manifesto Altmetrics: A manifesto]'', 26 October 2010.
<!--
* TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In ''Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18)''. https://doi.org/10.1145/3173574.3173929
* Press, Gil. [https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#2608257f6f63 ''Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says.''] Forbes, 2016.
* Christensen, Garret. [https://github.com/garretchristensen/BestPracticesManual/blob/master/Manual.pdf ''Manual of Best Practices in Transparent Social Science Research.''] 2016.
* Aschwanden, Christie. [https://fivethirtyeight.com/features/science-isnt-broken/ ''Science Isn't Broken''] FiveThirtyEight, 2015.
-->
;Assignment 1 [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A1:_Data_curation|Data curation]] resources:
*Chapter 2 [https://www.practicereproducibleresearch.org/core-chapters/2-assessment.html "Assessing Reproducibility"] and Chapter 3 [https://www.practicereproducibleresearch.org/core-chapters/3-basic.html "The Basic Reproducible Workflow Template"] from ''The Practice of Reproducible Research'' University of California Press, 2018.
* sample code for API calls ([http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb view the notebook], [http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb?format=raw download the notebook]).
*''See [[Human_Centered_Data_Science/Datasets#Dataset_documentation_examples|the datasets page]] for examples of well-documented and not-so-well documented open datasets.''
<br/>
<hr/>
<br/>
=== Week 4: October 18 ===
[[HCDS_(Fall_2018)/Day_4_plan|Day 4 plan]]
[[:File:HCDS 2018 week 4 slides.pdf|Day 4 slides]]
;Interrogating datasets: ''causes and consequences of bias in data; best practices for selecting, describing, and implementing training data''
;Assignments due
* Reading reflection
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A1:_Data_curation|A1: Data curation]]
;Agenda
{{:HCDS (Fall 2018)/Day 4 plan}}
;Readings assigned (Read both, reflect on one)
* Wang, Tricia. ''[https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7 Why Big Data Needs Thick Data]''. Ethnography Matters, 2016.
* Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. ''[http://www-users.cs.umn.edu/~bhecht/publications/goldstandards_CSCW2015.pdf Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards]''. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15)
;Homework assigned
* Reading reflection
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A2:_Bias_in_data|A2: Bias in data]]
;Resources
* Olteanu, A., Castillo, C., Diaz, F., & Kiciman, E. (2016). ''[http://kiciman.org/wp-content/uploads/2017/08/SSRN-id2886526.pdf Social data: Biases, methodological pitfalls, and ethical boundaries].
* Brian N Larson. 2017. ''[http://www.ethicsinnlp.org/workshop/pdf/EthNLP04.pdf Gender as a Variable in Natural-Language Processing: Ethical Considerations]. EthNLP, 3: 30–40.
* Bender, E. M., & Friedman, B. (2018). [https://openreview.net/forum?id=By4oPeX9f Data Statements for NLP: Toward Mitigating System Bias and Enabling Better Science]. To appear in Transactions of the ACL.
* Isaac L. Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. ''[http://delivery.acm.org/10.1145/2860000/2858123/p13-johnson.pdf?ip=209.166.92.236&id=2858123&acc=CHORUS&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1539880715_eb477907771cea4ecaabc953094c3080 Not at Home on the Range: Peer Production and the Urban/Rural Divide].'' CHI '16. DOI: https://doi.org/10.1145/2858036.2858123
* Leo Graiden Stewart, Ahmer Arif, A. Conrad Nied, Emma S. Spiro, and Kate Starbird. 2017. ''[https://faculty.washington.edu/kstarbi/Stewart_Starbird_Drawing_the_Lines_of_Contention-final.pdf Drawing the Lines of Contention: Networked Frame Contests Within #BlackLivesMatter Discourse].'' Proc. ACM Hum.-Comput. Interact. 1, CSCW, Article 96 (December 2017), 23 pages. DOI: https://doi.org/10.1145/3134920
* Cristian Danescu-Niculescu-Mizil, Robert West, Dan Jurafsky, Jure Leskovec, and Christopher Potts. 2013. ''[https://web.stanford.edu/~jurafsky/pubs/linguistic_change_lifecycle.pdf No country for old members: user lifecycle and linguistic change in online communities].'' In Proceedings of the 22nd international conference on World Wide Web (WWW '13). ACM, New York, NY, USA, 307-318. DOI: https://doi.org/10.1145/2488388.2488416 
<!-- * Astrid Mager. 2012. Algorithmic ideology: How capitalist society shapes search engines. Information, Communication & Society 15, 5: 769–787. http://doi.org/10.1080/1369118X.2012.676056 (in Canvas) -->
<br/>
<hr/>
<br/>
=== Week 5: October 25 ===
[[HCDS_(Fall_2018)/Day_5_plan|Day 5 plan]]
[[:File:HCDS 2018 week 5 slides.pdf|Day 5 slides]]
;Introduction to mixed-methods research: ''Big data vs thick data; integrating qualitative research methods into data science practice; crowdsourcing''
;Assignments due
* Reading reflection
;Agenda
{{:HCDS (Fall 2018)/Day 5 plan}}
;Readings assigned (Read both, reflect on one)
* Donovan, J., Caplan, R., Matthews, J., & Hanson, L. (2018). ''[https://datasociety.net/wp-content/uploads/2018/04/Data_Society_Algorithmic_Accountability_Primer_FINAL.pdf Algorithmic accountability: A primer]''. Data & Society, 501(c).
;Homework assigned
* Reading reflection
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A3:_Crowdwork_ethnography|A3: Crowdwork ethnography]]
;Qualitative research methods resources
* Ladner, S. (2016). ''[http://www.practicalethnography.com/ Practical ethnography: A guide to doing ethnography in the private sector]''. Routledge.
* Spradley, J. P. (2016). ''[https://www.waveland.com/browse.php?t=688 The ethnographic interview]''. Waveland Press.
* Eriksson, P., & Kovalainen, A. (2015). ''[http://study.sagepub.com/sites/default/files/Eriksson%20and%20Kovalainen.pdf Ch 12: Ethnographic Research]''. In Qualitative methods in business research: A practical guide to social research. Sage.
* Usability.gov, ''[https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html System usability scale]''.
* Nielsen, Jakob (2000). ''[https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ Why you only need to test with five users]''. nngroup.com.
;Wikipedia gender gap research resources
* Hill, B. M., & Shaw, A. (2013). ''[journals.plos.org/plosone/article?id=10.1371/journal.pone.0065782 The Wikipedia gender gap revisited: Characterizing survey response bias with propensity score estimation]''. PloS one, 8(6), e65782
* Shyong (Tony) K. Lam, Anuradha Uduwage, Zhenhua Dong, Shilad Sen, David R. Musicant, Loren Terveen, and John Riedl. 2011. ''[http://files.grouplens.org/papers/wp-gender-wikisym2011.pdf WP:clubhouse?: an exploration of Wikipedia's gender imbalance.]'' In Proceedings of the 7th International Symposium on Wikis and Open Collaboration (WikiSym '11). ACM, New York, NY, USA, 1-10. DOI=http://dx.doi.org/10.1145/2038558.2038560
* Maximillian Klein. ''[http://whgi.wmflabs.org/gender-by-language.html Gender by Wikipedia Language]''. Wikidata Human Gender Indicators (WHGI), 2017.
* Source: Wagner, C., Garcia, D., Jadidi, M., & Strohmaier, M. (2015, April). ''[https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10585/10528 It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia]''. In ICWSM (pp. 454-463).
* Benjamin Collier and Julia Bear. ''[https://static1.squarespace.com/static/521c8817e4b0dca2590b4591/t/523745abe4b05150ff027a6e/1379354027662/2012+-+Collier%2C+Bear+-+Conflict%2C+confidence%2C+or+criticism+an+empirical+examination+of+the+gender+gap+in+Wikipedia.pdf Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions]''. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW '12). DOI: https://doi.org/10.1145/2145204.2145265
* Christina Shane-Simpson, Kristen Gillespie-Lynch, Examining potential mechanisms underlying the Wikipedia gender gap through a collaborative editing task, In Computers in Human Behavior, Volume 66, 2017, https://doi.org/10.1016/j.chb.2016.09.043. (PDF on Canvas)
* Amanda Menking and Ingrid Erickson. 2015. ''[https://upload.wikimedia.org/wikipedia/commons/7/77/The_Heart_Work_of_Wikipedia_Gendered,_Emotional_Labor_in_the_World%27s_Largest_Online_Encyclopedia.pdf The Heart Work of Wikipedia: Gendered, Emotional Labor in the World's Largest Online Encyclopedia]''. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). https://doi.org/10.1145/2702123.2702514
;Crowdwork research resources
* WeArDynamo contributors. ''[http://wiki.wearedynamo.org/index.php?title=Basics_of_how_to_be_a_good_requester How to be a good requester]'' and ''[http://wiki.wearedynamo.org/index.php?title=Guidelines_for_Academic_Requesters Guidelines for Academic Requesters]''. Wearedynamo.org
<br/>
<hr/>
<br/>
=== Week 6: November 1 ===
[[HCDS_(Fall_2018)/Day_6_plan|Day 6 plan]]
[[:File:HCDS 2018 week 6 slides.pdf|Day 6 slides]]
;Interrogating algorithms: ''algorithmic fairness, transparency, and accountability; methods and contexts for algorithmic audits''
;Assignments due
* Reading reflection
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A2:_Bias_in_data|A2: Bias in data]]
;Agenda
{{:HCDS (Fall 2018)/Day 6 plan}}
;Readings assigned
* Astrid Mager. 2012. ''[https://computingeverywhere.soc.northwestern.edu/wp-content/uploads/2017/07/Mager-Algorithmic-Ideology-Required.pdf Algorithmic ideology: How capitalist society shapes search engines]''. Information, Communication & Society 15, 5: 769–787. http://doi.org/10.1080/1369118X.2012.676056
;Homework assigned
* Reading reflection
;Resources
* Christian Sandvig, Kevin Hamilton, Karrie Karahalios, Cedric Langbort (2014/05/22) ''[http://www-personal.umich.edu/~csandvig/research/Auditing%20Algorithms%20--%20Sandvig%20--%20ICA%202014%20Data%20and%20Discrimination%20Preconference.pdf Auditing Algorithms: Research Methods for Detecting Discrimination on Internet Platforms].'' Paper presented to "Data and Discrimination: Converting Critical Concerns into Productive Inquiry," a preconference at the 64th Annual Meeting of the International Communication Association. May 22, 2014; Seattle, WA, USA.
* Shahriari, K., & Shahriari, M. (2017). ''[https://ethicsinaction.ieee.org/ IEEE standard review - Ethically aligned design: A vision for prioritizing human wellbeing with artificial intelligence and autonomous systems].'' Institute of Electrical and Electronics Engineers
* ACM US Policy Council ''[https://www.acm.org/binaries/content/assets/public-policy/2017_usacm_statement_algorithms.pdf Statement on Algorithmic Transparency and Accountability].'' January 2017.
* ''[https://futureoflife.org/ai-principles/ Asilomar AI Principles].'' Future of Life Institute, 2017.
* Diakopoulos, N., Friedler, S., Arenas, M., Barocas, S., Hay, M., Howe, B., … Zevenbergen, B. (2018). ''[http://www.fatml.org/resources/principles-for-accountable-algorithms Principles for Accountable Algorithms and a Social Impact Statement for Algorithms].'' Fatml.Org 2018.
* Friedman, B., & Nissenbaum, H. (1996). ''[https://www.vsdesign.org/publications/pdf/64_friedman.pdf Bias in Computer Systems]''. ACM Trans. Inf. Syst., 14(3), 330–347.
* Diakopoulos, N. (2014). Algorithmic accountability reporting: On the investigation of black boxes. Tow Center for Digital Journalism, 1–33. https://doi.org/10.1002/ejoc.201200111
* Nate Matias, 2017. ''[https://medium.com/@natematias/how-anyone-can-audit-facebooks-newsfeed-b879c3e29015 How Anyone Can Audit Facebook's Newsfeed].'' Medium.com
* Hill, Kashmir. ''[https://gizmodo.com/facebook-figured-out-my-family-secrets-and-it-wont-tel-1797696163 Facebook figured out my family secrets, and it won't tell me how].'' Engadget, 2017.
* Blue, Violet. ''[https://www.engadget.com/2017/09/01/google-perspective-comment-ranking-system/ Google’s comment-ranking system will be a hit with the alt-right].'' Engadget, 2017.
* Ingold, David and Soper, Spencer. ''[https://www.bloomberg.com/graphics/2016-amazon-same-day/ Amazon Doesn’t Consider the Race of Its Customers. Should It?].'' Bloomberg, 2016.
* Paul Lamere. ''[https://musicmachinery.com/2011/05/14/how-good-is-googles-instant-mix/ How good is Google's Instant Mix?].'' Music Machinery, 2011.
* Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. ''[https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Machine Bias: Risk Assessment in Criminal Sentencing]. Propublica, May 2018.
* [https://www.perspectiveapi.com/#/ Google's Perspective API]
<br/>
<hr/>
<br/>
=== Week 7: November 8 ===
[[HCDS_(Fall_2018)/Day_7_plan|Day 7 plan]]
[[:File:HCDS 2018 week 7 slides.pdf|Day 7 slides]]
;Critical approaches to data science: ''power, data, and society; ethics of crowdwork''
;Assignments due
* Reading reflection
* A3: Crowdwork ethnography
;Agenda
{{:HCDS (Fall 2018)/Day 7 plan}}
;Readings assigned (read both, reflect on one)
* Read: Baumer, E. P. S. (2017). ''[http://journals.sagepub.com/doi/pdf/10.1177/2053951717718854 Toward human-centered algorithm design].'' Big Data & Society.
* Read: Amershi, S., Cakmak, M., Knox, W. B., & Kulesza, T. (2014). ''[http://www.aaai.org/ojs/index.php/aimagazine/article/download/2513/2456 Power to the People: The Role of Humans in Interactive Machine Learning].'' AI Magazine, 35(4), 105.
;Homework assigned
* Reading reflection
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A4:_Final_project_plan|A4: Final project plan]]
;Resources
* Neff, G., Tanweer, A., Fiore-Gartland, B., & Osburn, L. (2017). Critique and Contribute: A Practice-Based Framework for Improving Critical Data Studies and Data Science. Big Data, 5(2), 85–97. https://doi.org/10.1089/big.2016.0050
* Lilly C. Irani and M. Six Silberman. 2013. ''[https://escholarship.org/content/qt10c125z3/qt10c125z3.pdf Turkopticon: interrupting worker invisibility in amazon mechanical turk]''. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '13). DOI: https://doi.org/10.1145/2470654.2470742
* Bivens, R. and Haimson, O.L. 2016. ''[http://journals.sagepub.com/doi/pdf/10.1177/2056305116672486 Baking Gender Into Social Media Design: How Platforms Shape Categories for Users and Advertisers]''. Social Media + Society. 2, 4 (2016), 205630511667248. DOI:https://doi.org/10.1177/2056305116672486.
* Schlesinger, A. et al. 2017. ''[http://arischlesinger.com/wp-content/uploads/2017/03/chi2017-schlesinger-intersectionality.pdf Intersectional HCI: Engaging Identity through Gender, Race, and Class].'' Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems - CHI ’17. (2017), 5412–5427. DOI:https://doi.org/10.1145/3025453.3025766.
<br/>
<hr/>
<br/>
=== Week 8: November 15 ===
[[HCDS_(Fall_2018)/Day_8_plan|Day 8 plan]]
[[:File:HCDS 2018 week 8 slides.pdf|Day 8 slides]]
;Human-centered algorithm design: ''algorithmic interpretibility; human-centered methods for designing and evaluating algorithmic systems''
;Assignments due
* Reading reflection
;Agenda
{{:HCDS (Fall 2018)/Day 8 plan}}
;Readings assigned
* Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). ''[https://mako.cc/academic/hill_etal-cdsw_chapter-DRAFT.pdf Democratizing Data Science: The Community Data Science Workshops and Classes].'' In N. Jullien, S. A. Matei, & S. P. Goggins (Eds.), Big Data Factories: Scientific Collaborative approaches for virtual community data collection, repurposing, recombining, and dissemination.
;Homework assigned
* Reading reflection
;Resources
* Ethical OS ''[https://ethicalos.org/wp-content/uploads/2018/08/Ethical-OS-Toolkit-2.pdf Toolkit]'' and ''[https://ethicalos.org/wp-content/uploads/2018/08/EthicalOS_Check-List_080618.pdf Risk Mitigation Checklist]''. EthicalOS.org.
* Morgan, J. 2016. ''[https://meta.wikimedia.org/wiki/Research:Evaluating_RelatedArticles_recommendations Evaluating Related Articles recommendations]''. Wikimedia Research.
* Morgan, J. 2017. ''[https://meta.wikimedia.org/wiki/Research:Comparing_most_read_and_trending_edits_for_Top_Articles_feature Comparing most read and trending edits for the top articles feature]''. Wikimedia Research.
*Michael D. Ekstrand, F. Maxwell Harper, Martijn C. Willemsen, and Joseph A. Konstan. 2014. ''[https://md.ekstrandom.net/research/pubs/listcmp/listcmp.pdf User perception of differences in recommender algorithms].'' In Proceedings of the 8th ACM Conference on Recommender systems (RecSys '14).
* Sean M. McNee, John Riedl, and Joseph A. Konstan. 2006. ''[http://files.grouplens.org/papers/mcnee-chi06-hri.pdf Making recommendations better: an analytic model for human-recommender interaction].'' In CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06).
* Sean M. McNee, Nishikant Kapoor, and Joseph A. Konstan. 2006. ''[http://files.grouplens.org/papers/p171-mcnee.pdf Don't look stupid: avoiding pitfalls when recommending research papers].'' In Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (CSCW '06).
* Michael D. Ekstrand and Martijn C. Willemsen. 2016. ''[https://md.ekstrandom.net/research/pubs/behaviorism/BehaviorismIsNotEnough.pdf Behaviorism is Not Enough: Better Recommendations through Listening to Users].'' In Proceedings of the 10th ACM Conference on Recommender Systems (RecSys '16).
* Jess Holbrook. ''[https://medium.com/google-design/human-centered-machine-learning-a770d10562cd Human Centered Machine Learning].'' Google Design Blog. 2017.
* Anderson, Carl. ''[https://medium.com/@leapingllamas/the-role-of-model-interpretability-in-data-science-703918f64330 The role of model interpretability in data science].'' Medium, 2016.
<br/>
<hr/>
<br/>
=== Week 9: November 22 (No Class Session)===
[[HCDS_(Fall_2018)/Day_8_plan|Day 9 plan]]
;Data science for social good: ''Community-based and participatory approaches to data science; Using data science for society's benefit''
;Assignments due
* Reading reflection
* A4: Final project plan
;Agenda
{{:HCDS (Fall 2018)/Day 9 plan}}
;Readings assigned
* Berney, Rachel, Bernease Herman, Gundula Proksch, Hillary Dawkins, Jacob Kovacs, Yahui Ma, Jacob Rich, and Amanda Tan. ''[https://dssg.uchicago.edu/wp-content/uploads/2017/09/berney.pdf Visualizing Equity: A Data Science for Social Good Tool and Model for Seattle].'' Data Science for Social Good Conference, September 2017, Chicago, Illinois USA (2017).
;Homework assigned
* Reading reflection
;Resources
*  Daniela Aiello, Lisa Bates, et al. [https://shelterforce.org/2018/08/22/eviction-lab-misses-the-mark/ Eviction Lab Misses the Mark], ShelterForce, August 2018. 
<br/>
<hr/>
<br/>
=== Week 10: November 29 ===
[[HCDS_(Fall_2018)/Day_10_plan|Day 10 plan]]
[[:File:HCDS 2018 week 10 slides.pdf|Day 10 slides]]
;User experience and big data: ''Design considerations for machine learning applications; human centered data visualization; data storytelling''
;Assignments due
* Reading reflection
;Agenda
{{:HCDS (Fall 2018)/Day 10 plan}}
;Readings assigned
* NONE
;Homework assigned
* A5: Final presentation
;Resources
*Fabien Girardin. ''[https://medium.com/@girardin/experience-design-in-the-machine-learning-era-e16c87f4f2e2 Experience design in the machine learning era].'' Medium, 2016.
* Xavier Amatriain and Justin Basilico. ''[https://medium.com/netflix-techblog/netflix-recommendations-beyond-the-5-stars-part-1-55838468f429 Netflix Recommendations: Beyond the 5 stars].'' Netflix Tech Blog, 2012.
* Jess Holbrook. ''[https://medium.com/google-design/human-centered-machine-learning-a770d10562cd Human Centered Machine Learning].'' Google Design Blog. 2017.
* Bart P. Knijnenburg, Martijn C. Willemsen, Zeno Gantner, Hakan Soncu, and Chris Newell. 2012. ''[https://pure.tue.nl/ws/files/3484177/724656348730405.pdf Explaining the user experience of recommender systems].'' User Modeling and User-Adapted Interaction 22, 4-5 (October 2012), 441-504. DOI=http://dx.doi.org/10.1007/s11257-011-9118-4
* Patrick Austin, ''[https://gizmodo.com/facebook-google-and-microsoft-use-design-to-trick-you-1827168534 Facebook, Google, and Microsoft Use Design to Trick You Into Handing Over Your Data, New Report Warns].'' Gizmodo, 6/18/2018
* Brown, A., Tuor, A., Hutchinson, B., & Nichols, N. (2018). ''[[https://arxiv.org/abs/1803.04967 Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection].'' arXiv preprint arXiv:1803.04967.
* Cremonesi, P., Elahi, M., & Garzotto, F. (2017). ''[https://core.ac.uk/download/pdf/74313597.pdf User interface patterns in recommendation-empowered content intensive multimedia applications].'' Multimedia Tools and Applications, 76(4), 5275-5309.
* Marilynn Larkin, ''[https://www.elsevier.com/connect/how-to-give-a-dynamic-scientific-presentation How to give a dynamic scientific presentation].'' Elsevier Connect, 2015.
* Megan Risdal, ''[http://blog.kaggle.com/2016/06/29/communicating-data-science-a-guide-to-presenting-your-work/ Communicating data science: a guide to presenting your work].'' Kaggle blog, 2016.
* Megan Risdal, ''[http://blog.kaggle.com/2016/08/10/communicating-data-science-why-and-some-of-the-how-to-visualize-information/ Communicating data science: Why and how to visualize information].'' Kaggle blog, 2016.
* Megan Risdal, ''[http://blog.kaggle.com/2016/06/13/communicating-data-science-an-interview-with-a-storytelling-expert-tyler-byers/ Communicating data science: an interview with a storytelling expert].'' Kaggle blog, 2016.
* Brent Dykes, ''[https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/ Data Storytelling: The Essential Data Science Skill Everyone Needs].'' Forbes, 2016.
<br/>
<hr/>
<br/>
=== Week 11: December 6 ===
[[HCDS_(Fall_2018)/Day_11_plan|Day 11 plan]]
;Final presentations: course wrap up, presentation of student projects''
;Assignments due
* A5: Final presentation
;Agenda
{{:HCDS (Fall 2018)/Day 11 plan}}
;Readings assigned
* none!
;Homework assigned
* A6: Final project report (by 11:59pm)
;Resources
* ''one''
<br/>
<hr/>
<br/>
=== Week 12: Finals Week (No Class Session) ===
* NO CLASS
* A6: FINAL PROJECT REPORT DUE BY 11:59PM
<!-- * LATE PROJECT SUBMISSIONS NOT ACCEPTED. -->
</div>
</div>


[[Category:Groceryheist drafts]]
[[Category:Groceryheist drafts]]

Revision as of 23:52, 8 February 2019

Data Science and Organizational Communication
Principal instructor
Nate TeBlunthuis
Course Catalog Description
Fundamental principles of data science and its implications, including research ethics; data privacy; legal frameworks; algorithmic bias, transparency, fairness and accountability; data provenance, curation, preservation, and reproducibility; human computation; data communication and visualization; the role of data science in organizational context and the societal impacts of data science.

Course Description

The rise of "data science" reflects a broad and ongoing shift in how many teams, organizational leaders, communities of practice, and entire industries create and use knowledge. This class teaches "data science" as practiced by data-intensive knowledge workers but also as it is positioned in historical, organizational, institutional, and societal contexts. Students will gain an appriciation for the technical and intellectual aspects of data science, consider critical questions about how data science is often practiced, and envision ethical and effective science practice in their current and future organiational roles. The format of the class will be a mix of lecture, discussion, in-class activities, and qualitative and quantitative research assignments.

The course is designed around two high-stakes projects. In the first stage of the students will attend the Community Data Science Workshop (CDSC). I am one of the organizers and instructors of this three week intensive workshop on basic programming and data analysis skills. The first course project is to apply these skills together with the conceptual material from this course we have covered so far to conduct an original data analysis on a topic of the student's interest. The second high-stakes project is a critical analysis of an organization or work team. For this project students will serve as consultants to an organizational unit involved in data science. Through interviews and workplace observations they will gain an understanding of the socio-technical and organizational context of their team. They will then synthesize this understanding with the knowledge they gained from the course material to compose a report offering actionable insights to their team.

Learning Objectives

By the end of this course, students will be able to:

  • Understand what it means to analyze large and complex data effectively and ethically with an understanding of human, societal, organizational, and socio-technical contexts.
  • Consider the account ethical, social, organizational, and legal considerations of data science in organizational and institutional contexts.
  • Combine quantitative and qualitative data to generate critical insights into human behavior.
  • Discuss and evaluate ethical, social, organizational and legal trade-offs of different data analysis, testing, curation, and sharing methods.

Schedule

Course schedule (click to expand)

This page is a work in progress.


Week 1:

Introduction to Human Centered Data Science
What is data science? What is human centered? What is human centered data science?
Assignments due
  • Fill out the pre-course survey
  • Attend week 1 of CDSW


Readings assigned
Homework assigned
  • Reading reflection





Week 2:

Ethical considerations
privacy, informed consent and user treatment


Assignments due
  • Week 1 reading reflection
  • Attend week 2 of CDSW


Readings assigned


Homework assigned
Resources




Week 3: October 11

Day 3 plan

Day 3 slides

Reproducibility and Accountability
data curation, preservation, documentation, and archiving; best practices for open scientific research
Assignments due
  • Week 2 reading reflection
Agenda
  • Six Provocations for Big Data: Review & Reflections
  • A primer on copyright, licensing, and hosting for code and data
  • Introduction to replicability, reproducibility, and open research
  • Reproducibility case study: fivethirtyeight.com
  • Group activity: assessing reproducibility in data journalism
  • Overview of Assignment 1: Data curation


Readings assigned
Homework assigned
  • Reading reflection
Resources


Assignment 1 Data curation resources





Week 4: October 18

Day 4 plan

Day 4 slides

Interrogating datasets
causes and consequences of bias in data; best practices for selecting, describing, and implementing training data


Assignments due
Agenda
  • Final project: Goal, timeline, and deliverables.
  • Overview of assignment 2: Bias in data
  • Reading reflections review
  • Sources of bias in datasets
  • Introduction to assignment 2: Bias in data
  • Sources of bias in data collection and processing
  • In-class exercise: assessing bias in training data


Readings assigned (Read both, reflect on one)
Homework assigned


Resources




Week 5: October 25

Day 5 plan

Day 5 slides

Introduction to mixed-methods research
Big data vs thick data; integrating qualitative research methods into data science practice; crowdsourcing


Assignments due
  • Reading reflection


Agenda
  • Assignment 1 review & reflection
  • Week 4 reading reflection discussion
  • Survey of qualitative research methods
  • Mixed-methods case study #1: The Wikipedia Gender Gap: causes & consequences
  • In-class activity: Automated Gender Recognition scenarios
  • Introduction to ethnography
  • Ethnographic research case study: Structured data on Wikimedia Commons
  • Introduction to crowdwork
  • Overview of Assignment 3: Crowdwork ethnography


Readings assigned (Read both, reflect on one)


Homework assigned


Qualitative research methods resources
Wikipedia gender gap research resources
Crowdwork research resources





Week 6: November 1

Day 6 plan

Day 6 slides

Interrogating algorithms
algorithmic fairness, transparency, and accountability; methods and contexts for algorithmic audits
Assignments due
Agenda
  • Reading reflections
  • Ethical implications of crowdwork
  • Algorithmic transparency, interpretability, and accountability
  • Auditing algorithms
  • In-class activity: auditing the Perspective API


Readings assigned


Homework assigned
  • Reading reflection


Resources





Week 7: November 8

Day 7 plan

Day 7 slides

Critical approaches to data science
power, data, and society; ethics of crowdwork


Assignments due
  • Reading reflection
  • A3: Crowdwork ethnography


Agenda
  • Guest lecture: Rochelle LaPlante


Readings assigned (read both, reflect on one)
Homework assigned


Resources





Week 8: November 15

Day 8 plan

Day 8 slides

Human-centered algorithm design
algorithmic interpretibility; human-centered methods for designing and evaluating algorithmic systems


Assignments due
  • Reading reflection


Agenda
  • Final project overview & examples
  • Guest Lecture: Kelly Franznick, Blink UX
  • Reading reflections
  • Human-centered algorithm design
  • design process
  • user-driven evaluation
  • design patterns & anti-patterns


Readings assigned
Homework assigned
  • Reading reflection
Resources





Week 9: November 22 (No Class Session)

Day 9 plan

Data science for social good
Community-based and participatory approaches to data science; Using data science for society's benefit
Assignments due
  • Reading reflection
  • A4: Final project plan
Agenda
  • Reading reflections discussion
  • Feedback on Final Project Plans
  • Guest lecture: Steven Drucker (Microsoft Research)
  • UI patterns & UX considerations for ML/data-driven applications
  • Final project presentation: what to expect
  • In-class activity: final project peer review


Readings assigned
Homework assigned
  • Reading reflection
Resources





Week 10: November 29

Day 10 plan

Day 10 slides

User experience and big data
Design considerations for machine learning applications; human centered data visualization; data storytelling


Assignments due
  • Reading reflection


Agenda
  • Reading reflections discussion
  • Feedback on Final Project Plans
  • Guest lecture: Steven Drucker (Microsoft Research)
  • UI patterns & UX considerations for ML/data-driven applications
  • Final project presentation: what to expect
  • In-class activity: final project peer review


Readings assigned
  • NONE
Homework assigned
  • A5: Final presentation
Resources





Week 11: December 6

Day 11 plan

Final presentations
course wrap up, presentation of student projects


Assignments due
  • A5: Final presentation


Agenda
  • Student final presentations
  • Course wrap-up


Readings assigned
  • none!
Homework assigned
  • A6: Final project report (by 11:59pm)
Resources
  • one




Week 12: Finals Week (No Class Session)

  • NO CLASS
  • A6: FINAL PROJECT REPORT DUE BY 11:59PM