Editing User:Groceryheist/drafts/Data Science Syllabus
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
; | ;Data Science and Organizational Communication: | ||
; | ;Principal instructor: [[User:Groceryheist|Nate TeBlunthuis]] | ||
;Course Catalog Description: Fundamental principles of data science and its implications, including research ethics; data privacy; legal frameworks; algorithmic bias, transparency, fairness and accountability; data provenance, curation, preservation, and reproducibility; human computation; data communication and visualization; the role of data science in organizational context and the societal impacts of data science. | ;Course Catalog Description: Fundamental principles of data science and its implications, including research ethics; data privacy; legal frameworks; algorithmic bias, transparency, fairness and accountability; data provenance, curation, preservation, and reproducibility; human computation; data communication and visualization; the role of data science in organizational context and the societal impacts of data science. | ||
== Course Description == | == Course Description == | ||
The rise of "data science" reflects a broad and ongoing shift in how many teams, organizational leaders, communities of practice, and entire industries create and use knowledge. This class teaches "data science" as practiced by data-intensive knowledge workers but also as it is positioned in historical, organizational, institutional, and societal contexts. Students will gain an | The rise of "data science" reflects a broad and ongoing shift in how many teams, organizational leaders, communities of practice, and entire industries create and use knowledge. This class teaches "data science" as practiced by data-intensive knowledge workers but also as it is positioned in historical, organizational, institutional, and societal contexts. Students will gain an appriciation for the technical and intellectual aspects of data science, consider critical questions about how data science is often practiced, and envision ethical and effective science practice in their current and future organiational roles. The format of the class will be a mix of lecture, discussion, in-class activities, and qualitative and quantitative research assignments. | ||
The course is designed around two high-stakes projects. In the first stage of the students will attend the Community Data Science Workshop (CDSC). I am one of the organizers and instructors of this three week intensive workshop on basic programming and data analysis skills. The first course project is to apply these skills together with the conceptual material from this course we have covered so far to conduct an original data analysis on a topic of the student's interest. The second high-stakes project is a critical analysis of an organization or work team. For this project students will serve as consultants to an organizational unit involved in data science. Through interviews and workplace observations they will gain an understanding of the socio-technical and organizational context of their team. They will then synthesize this understanding with the knowledge they gained from the course material to compose a report offering actionable insights to their team. | The course is designed around two high-stakes projects. In the first stage of the students will attend the Community Data Science Workshop (CDSC). I am one of the organizers and instructors of this three week intensive workshop on basic programming and data analysis skills. The first course project is to apply these skills together with the conceptual material from this course we have covered so far to conduct an original data analysis on a topic of the student's interest. The second high-stakes project is a critical analysis of an organization or work team. For this project students will serve as consultants to an organizational unit involved in data science. Through interviews and workplace observations they will gain an understanding of the socio-technical and organizational context of their team. They will then synthesize this understanding with the knowledge they gained from the course material to compose a report offering actionable insights to their team. | ||
Line 56: | Line 56: | ||
;Assignments due | ;Assignments due | ||
* Fill out the pre-course survey | |||
* Attend week 1 of CDSW | |||
* Read: Provost, Foster, and Tom Fawcett. [http://online.liebertpub.com/doi/pdf/10.1089/big.2013.1508 ''Data science and its relationship to big data and data-driven decision making.''] Big Data 1.1 (2013): 51-59. | * Read: Provost, Foster, and Tom Fawcett. [http://online.liebertpub.com/doi/pdf/10.1089/big.2013.1508 ''Data science and its relationship to big data and data-driven decision making.''] Big Data 1.1 (2013): 51-59. | ||
Line 64: | Line 65: | ||
;Readings assigned | ;Readings assigned | ||
* Read: Barocas, Solan and Nissenbaum, Helen. [https://www.nyu.edu/projects/nissenbaum/papers/BigDatasEndRun.pdf ''Big Data's End Run around Anonymity and Consent'']. In ''Privacy, Big Data, and the Public Good''. 2014. | * Read: Barocas, Solan and Nissenbaum, Helen. [https://www.nyu.edu/projects/nissenbaum/papers/BigDatasEndRun.pdf ''Big Data's End Run around Anonymity and Consent'']. In ''Privacy, Big Data, and the Public Good''. 2014. | ||
;Homework assigned | ;Homework assigned | ||
* | * Reading reflection | ||
* Attend week | * Attend week 2 of CDSW | ||
<!-- ;Resources --> | <!-- ;Resources --> | ||
* Kling, Rob and Star, Susan Leigh. [https://scholarworks.iu.edu/dspace/bitstream/handle/2022/1798/wp97-04B.html ''Human Centered Systems in the Perspective of Organizational and Social Informatics.''] 1997 | |||
<!-- * Aragon, C. et al. (2016). [https://cscw2016hcds.files.wordpress.com/2015/10/cscw_2016_human-centered-data-science_workshop.pdf ''Developing a Research Agenda for Human-Centered Data Science.''] Human Centered Data Science workshop, CSCW 2016. --> | <!-- * Aragon, C. et al. (2016). [https://cscw2016hcds.files.wordpress.com/2015/10/cscw_2016_human-centered-data-science_workshop.pdf ''Developing a Research Agenda for Human-Centered Data Science.''] Human Centered Data Science workshop, CSCW 2016. --> | ||
Line 88: | Line 89: | ||
;Assignments due | ;Assignments due | ||
* Week | * Week 1 reading reflection | ||
* | * | ||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
Line 98: | Line 99: | ||
;Homework assigned | ;Homework assigned | ||
* | * Reading reflection | ||
* Attend week 2 of CDSW | * Attend week 2 of CDSW | ||
* [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A1:_Data_curation|Assignment 1: Data curation]] | |||
<!-- ;Resources --> | <!-- ;Resources --> | ||
Line 129: | Line 131: | ||
;Assignments due | ;Assignments due | ||
* Week | * Week 2 reading reflection | ||
* Attend week 2 of CDSW | * Attend week 2 of CDSW | ||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
<!-- {{:HCDS (Fall 2018)/Day 3 plan}} --> | <!-- {{:HCDS (Fall 2018)/Day 3 plan}} --> | ||
Line 138: | Line 141: | ||
;Homework assigned | ;Homework assigned | ||
* | * Reading reflection | ||
* Attend week 3 of CDSW | * Attend week 3 of CDSW | ||
<!-- ;Resources --> | <!-- ;Resources --> | ||
Line 147: | Line 149: | ||
<!-- * Hickey, Walt. [https://fivethirtyeight.com/features/the-bechdel-test-checking-our-work/ ''The Bechdel Test: Checking Our Work'']. FiveThirtyEight, 2014. --> | <!-- * Hickey, Walt. [https://fivethirtyeight.com/features/the-bechdel-test-checking-our-work/ ''The Bechdel Test: Checking Our Work'']. FiveThirtyEight, 2014. --> | ||
<!-- * J. Priem, D. Taraborelli, P. Groth, C. Neylon (2010), ''[http://altmetrics.org/manifesto Altmetrics: A manifesto]'', 26 October 2010. --> | <!-- * J. Priem, D. Taraborelli, P. Groth, C. Neylon (2010), ''[http://altmetrics.org/manifesto Altmetrics: A manifesto]'', 26 October 2010. --> | ||
<!-- <\!-- --> | <!-- <\!-- --> | ||
<!-- * TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In ''Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18)''. https://doi.org/10.1145/3173574.3173929 --> | <!-- * TeBlunthuis, N., Shaw, A., and Hill, B.M. (2018). Revisiting "The rise and decline" in a population of peer production projects. In ''Proceedings of the 2018 ACM Conference on Human Factors in Computing Systems (CHI '18)''. https://doi.org/10.1145/3173574.3173929 --> | ||
Line 153: | Line 156: | ||
<!-- * Aschwanden, Christie. [https://fivethirtyeight.com/features/science-isnt-broken/ ''Science Isn't Broken''] FiveThirtyEight, 2015. --> | <!-- * Aschwanden, Christie. [https://fivethirtyeight.com/features/science-isnt-broken/ ''Science Isn't Broken''] FiveThirtyEight, 2015. --> | ||
<!-- -\-> --> | <!-- -\-> --> | ||
<!-- *Chapter 2 [https://www.practicereproducibleresearch.org/core-chapters/2-assessment.html "Assessing Reproducibility"] and Chapter 3 [https://www.practicereproducibleresearch.org/core-chapters/3-basic.html "The Basic Reproducible Workflow Template"] from ''The Practice of Reproducible Research'' University of California Press, 2018. --> | <!-- *Chapter 2 [https://www.practicereproducibleresearch.org/core-chapters/2-assessment.html "Assessing Reproducibility"] and Chapter 3 [https://www.practicereproducibleresearch.org/core-chapters/3-basic.html "The Basic Reproducible Workflow Template"] from ''The Practice of Reproducible Research'' University of California Press, 2018. --> | ||
<!-- * sample code for API calls ([http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb view the notebook], [http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb?format=raw download the notebook]). --> | <!-- * sample code for API calls ([http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb view the notebook], [http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb?format=raw download the notebook]). --> | ||
<!-- *''See [[Human_Centered_Data_Science/Datasets#Dataset_documentation_examples|the datasets page]] for examples of well-documented and not-so-well documented open datasets.'' --> | <!-- *''See [[Human_Centered_Data_Science/Datasets#Dataset_documentation_examples|the datasets page]] for examples of well-documented and not-so-well documented open datasets.'' --> | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
Line 169: | Line 176: | ||
;Assignments due | ;Assignments due | ||
* | * Reading reflection | ||
* | * [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A1:_Data_curation|A1: Data curation]] | ||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
Line 181: | Line 187: | ||
;Homework assigned | ;Homework assigned | ||
* | * Reading reflection | ||
<!-- ;Resources --> | <!-- ;Resources --> | ||
<!-- * Olteanu, A., Castillo, C., Diaz, F., & Kiciman, E. (2016). ''[http://kiciman.org/wp-content/uploads/2017/08/SSRN-id2886526.pdf Social data: Biases, methodological pitfalls, and ethical boundaries]. --> | <!-- * Olteanu, A., Castillo, C., Diaz, F., & Kiciman, E. (2016). ''[http://kiciman.org/wp-content/uploads/2017/08/SSRN-id2886526.pdf Social data: Biases, methodological pitfalls, and ethical boundaries]. --> | ||
Line 200: | Line 208: | ||
; Assignments due | ; Assignments due | ||
* Week | * Week 4 reading reflection | ||
* | * [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A1:_Data_curation|A1: Data curation]] | ||
; Readings assigned | ; Readings assigned | ||
Line 207: | Line 216: | ||
; Homework Assigned | ; Homework Assigned | ||
* | * Reading reflection | ||
* A2: | * [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A2:_Bias_in_data|A2: Bias in data]] | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
Line 215: | Line 225: | ||
=== Week 6 === | === Week 6 === | ||
; Data science in Organizational Contexts | ; Data science in Organizational Contexts | ||
; Assignments due | ; Assignments due | ||
* Week | * Week 5 reading reflection | ||
* A2: | * [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A2:_Bias_in_data|A2: Bias in data]] | ||
;Readings assigned (Read both, reflect on one) | ;Readings assigned (Read both, reflect on one) | ||
* Wang, Tricia. ''[https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7 Why Big Data Needs Thick Data]''. Ethnography Matters, 2016. | * Wang, Tricia. ''[https://medium.com/ethnography-matters/why-big-data-needs-thick-data-b4b3e75e3d7 Why Big Data Needs Thick Data]''. Ethnography Matters, 2016. | ||
* Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. ''[http://www-users.cs.umn.edu/~bhecht/publications/goldstandards_CSCW2015.pdf Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards]''. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15) | * Shilad Sen, Margaret E. Giesel, Rebecca Gold, Benjamin Hillmann, Matt Lesicko, Samuel Naden, Jesse Russell, Zixiao (Ken) Wang, and Brent Hecht. 2015. ''[http://www-users.cs.umn.edu/~bhecht/publications/goldstandards_CSCW2015.pdf Turkers, Scholars, "Arafat" and "Peace": Cultural Communities and Algorithmic Gold Standards]''. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15) | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
Line 236: | Line 243: | ||
<!-- [[:File:HCDS 2018 week 5 slides.pdf|Day 5 slides]] --> | <!-- [[:File:HCDS 2018 week 5 slides.pdf|Day 5 slides]] --> | ||
;Introduction to mixed-methods research: ''Big data vs thick data; integrating qualitative research methods into data science practice; | ;Introduction to mixed-methods research: ''Big data vs thick data; integrating qualitative research methods into data science practice; crowdsourcing'' | ||
;Assignments due | ;Assignments due | ||
* | * Reading reflection | ||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
Line 252: | Line 259: | ||
;Homework assigned | ;Homework assigned | ||
* | * Reading reflection | ||
* | <!-- * [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A3:_Crowdwork_ethnography|A3: Crowdwork ethnography]] --> | ||
<!-- ;Qualitative research methods resources --> | <!-- ;Qualitative research methods resources --> | ||
<!-- * Ladner, S. (2016). ''[http://www.practicalethnography.com/ Practical ethnography: A guide to doing ethnography in the private sector]''. Routledge. --> | <!-- * Ladner, S. (2016). ''[http://www.practicalethnography.com/ Practical ethnography: A guide to doing ethnography in the private sector]''. Routledge. --> | ||
Line 260: | Line 269: | ||
<!-- * Usability.gov, ''[https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html System usability scale]''. --> | <!-- * Usability.gov, ''[https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html System usability scale]''. --> | ||
<!-- * Nielsen, Jakob (2000). ''[https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ Why you only need to test with five users]''. nngroup.com. --> | <!-- * Nielsen, Jakob (2000). ''[https://www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/ Why you only need to test with five users]''. nngroup.com. --> | ||
<!-- ;Wikipedia gender gap research resources --> | <!-- ;Wikipedia gender gap research resources --> | ||
<!-- * Hill, B. M., & Shaw, A. (2013). ''[journals.plos.org/plosone/article?id=10.1371/journal.pone.0065782 The Wikipedia gender gap revisited: Characterizing survey response bias with propensity score estimation]''. PloS one, 8(6), e65782 --> | <!-- * Hill, B. M., & Shaw, A. (2013). ''[journals.plos.org/plosone/article?id=10.1371/journal.pone.0065782 The Wikipedia gender gap revisited: Characterizing survey response bias with propensity score estimation]''. PloS one, 8(6), e65782 --> | ||
Line 265: | Line 275: | ||
<!-- * Maximillian Klein. ''[http://whgi.wmflabs.org/gender-by-language.html Gender by Wikipedia Language]''. Wikidata Human Gender Indicators (WHGI), 2017. --> | <!-- * Maximillian Klein. ''[http://whgi.wmflabs.org/gender-by-language.html Gender by Wikipedia Language]''. Wikidata Human Gender Indicators (WHGI), 2017. --> | ||
<!-- * Source: Wagner, C., Garcia, D., Jadidi, M., & Strohmaier, M. (2015, April). ''[https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10585/10528 It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia]''. In ICWSM (pp. 454-463). --> | <!-- * Source: Wagner, C., Garcia, D., Jadidi, M., & Strohmaier, M. (2015, April). ''[https://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/viewFile/10585/10528 It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia]''. In ICWSM (pp. 454-463). --> | ||
<!-- * Benjamin Collier and Julia Bear. ''[https://static1.squarespace.com/static/521c8817e4b0dca2590b4591/t/523745abe4b05150ff027a6e/1379354027662/2012+-+Collier%2C+Bear+-+Conflict%2C+confidence%2C+or+criticism+an+empirical+examination+of+the+gender+gap+in+Wikipedia.pdf Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions]''. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW '12). DOI: https://doi.org/10.1145/2145204.2145265 --> | <!-- * Benjamin Collier and Julia Bear. ''[https://static1.squarespace.com/static/521c8817e4b0dca2590b4591/t/523745abe4b05150ff027a6e/1379354027662/2012+-+Collier%2C+Bear+-+Conflict%2C+confidence%2C+or+criticism+an+empirical+examination+of+the+gender+gap+in+Wikipedia.pdf Conflict, criticism, or confidence: an empirical examination of the gender gap in wikipedia contributions]''. In Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work (CSCW '12). DOI: https://doi.org/10.1145/2145204.2145265 --> | ||
<!-- * Christina Shane-Simpson, Kristen Gillespie-Lynch, Examining potential mechanisms underlying the Wikipedia gender gap through a collaborative editing task, In Computers in Human Behavior, Volume 66, 2017, https://doi.org/10.1016/j.chb.2016.09.043. (PDF on Canvas) --> | <!-- * Christina Shane-Simpson, Kristen Gillespie-Lynch, Examining potential mechanisms underlying the Wikipedia gender gap through a collaborative editing task, In Computers in Human Behavior, Volume 66, 2017, https://doi.org/10.1016/j.chb.2016.09.043. (PDF on Canvas) --> | ||
<!-- * Amanda Menking and Ingrid Erickson. 2015. ''[https://upload.wikimedia.org/wikipedia/commons/7/77/The_Heart_Work_of_Wikipedia_Gendered,_Emotional_Labor_in_the_World%27s_Largest_Online_Encyclopedia.pdf The Heart Work of Wikipedia: Gendered, Emotional Labor in the World's Largest Online Encyclopedia]''. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). https://doi.org/10.1145/2702123.2702514 --> | <!-- * Amanda Menking and Ingrid Erickson. 2015. ''[https://upload.wikimedia.org/wikipedia/commons/7/77/The_Heart_Work_of_Wikipedia_Gendered,_Emotional_Labor_in_the_World%27s_Largest_Online_Encyclopedia.pdf The Heart Work of Wikipedia: Gendered, Emotional Labor in the World's Largest Online Encyclopedia]''. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). https://doi.org/10.1145/2702123.2702514 --> | ||
<!-- ;Crowdwork research resources --> | <!-- ;Crowdwork research resources --> | ||
<!-- * WeArDynamo contributors. ''[http://wiki.wearedynamo.org/index.php?title=Basics_of_how_to_be_a_good_requester How to be a good requester]'' and ''[http://wiki.wearedynamo.org/index.php?title=Guidelines_for_Academic_Requesters Guidelines for Academic Requesters]''. Wearedynamo.org --> | <!-- * WeArDynamo contributors. ''[http://wiki.wearedynamo.org/index.php?title=Basics_of_how_to_be_a_good_requester How to be a good requester]'' and ''[http://wiki.wearedynamo.org/index.php?title=Guidelines_for_Academic_Requesters Guidelines for Academic Requesters]''. Wearedynamo.org --> | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
Line 282: | Line 296: | ||
;Assignments due | ;Assignments due | ||
* | * Reading reflection | ||
* A4: Final Project Plan | |||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
<!-- {{:HCDS (Fall 2018)/Day 6 plan}} --> | <!-- {{:HCDS (Fall 2018)/Day 6 plan}} --> | ||
;Readings assigned | ;Readings assigned | ||
* Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). ''[https://mako.cc/academic/hill_etal-cdsw_chapter-DRAFT.pdf Democratizing Data Science: The Community Data Science Workshops and Classes].'' In N. Jullien, S. A. Matei, & S. P. Goggins (Eds.), Big Data Factories: Scientific Collaborative approaches for virtual community data collection, repurposing, recombining, and dissemination. | * Hill, B. M., Dailey, D., Guy, R. T., Lewis, B., Matsuzaki, M., & Morgan, J. T. (2017). ''[https://mako.cc/academic/hill_etal-cdsw_chapter-DRAFT.pdf Democratizing Data Science: The Community Data Science Workshops and Classes].'' In N. Jullien, S. A. Matei, & S. P. Goggins (Eds.), Big Data Factories: Scientific Collaborative approaches for virtual community data collection, repurposing, recombining, and dissemination. | ||
;Homework assigned | ;Homework assigned | ||
* | * Reading reflection | ||
<!-- ;Resources --> | <!-- ;Resources --> | ||
<!-- * Ethical OS ''[https://ethicalos.org/wp-content/uploads/2018/08/Ethical-OS-Toolkit-2.pdf Toolkit]'' and ''[https://ethicalos.org/wp-content/uploads/2018/08/EthicalOS_Check-List_080618.pdf Risk Mitigation Checklist]''. EthicalOS.org. --> | <!-- * Ethical OS ''[https://ethicalos.org/wp-content/uploads/2018/08/Ethical-OS-Toolkit-2.pdf Toolkit]'' and ''[https://ethicalos.org/wp-content/uploads/2018/08/EthicalOS_Check-List_080618.pdf Risk Mitigation Checklist]''. EthicalOS.org. --> | ||
Line 314: | Line 332: | ||
<!-- * Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. ''[https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Machine Bias: Risk Assessment in Criminal Sentencing]. Propublica, May 2018. --> | <!-- * Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. ''[https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing Machine Bias: Risk Assessment in Criminal Sentencing]. Propublica, May 2018. --> | ||
<!-- * [https://www.perspectiveapi.com/#/ Google's Perspective API] --> | <!-- * [https://www.perspectiveapi.com/#/ Google's Perspective API] --> | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
<br/> | <br/> | ||
<!-- === Week 7 === --> | <!-- === Week 7 === --> | ||
<!-- <\!-- [[HCDS_(Fall_2018)/Day_7_plan|Day 7 plan]] -\-> --> | <!-- <\!-- [[HCDS_(Fall_2018)/Day_7_plan|Day 7 plan]] -\-> --> | ||
Line 324: | Line 344: | ||
<!-- ;Assignments due --> | <!-- ;Assignments due --> | ||
<!-- * Reading reflection --> | <!-- * Reading reflection --> | ||
<!-- <\!-- * [[Human_Centered_Data_Science_(Fall_2018)/Assignments#A3:_Crowdwork_ethnography|A3: Crowdwork ethnography]] -\-> --> | |||
<!-- <\!-- ;Agenda -\-> --> | <!-- <\!-- ;Agenda -\-> --> | ||
<!-- <\!-- {{:HCDS (Fall 2018)/Day 7 plan}} -\-> --> | <!-- <\!-- {{:HCDS (Fall 2018)/Day 7 plan}} -\-> --> | ||
Line 341: | Line 362: | ||
<!-- <hr/> --> | <!-- <hr/> --> | ||
<!-- <br/> --> | <!-- <br/> --> | ||
=== Week 9 === | === Week 9 === | ||
<!-- [[HCDS_(Fall_2018)/Day_8_plan|Day 9 plan]] --> | <!-- [[HCDS_(Fall_2018)/Day_8_plan|Day 9 plan]] --> | ||
;Data science for social good: ''Community-based and participatory approaches to data science; Using data science for society's benefit'' | ;Data science for social good: ''Community-based and participatory approaches to data science; Using data science for society's benefit'' | ||
;Assignments due | ;Assignments due | ||
* | * Reading reflection | ||
* A4: Final project plan | |||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
<!-- {{:HCDS (Fall 2018)/Day 9 plan}} --> | <!-- {{:HCDS (Fall 2018)/Day 9 plan}} --> | ||
Line 353: | Line 379: | ||
;Homework assigned | ;Homework assigned | ||
* Reading reflection | * Reading reflection | ||
;Resources | ;Resources | ||
* Daniela Aiello, Lisa Bates, et al. [https://shelterforce.org/2018/08/22/eviction-lab-misses-the-mark/ Eviction Lab Misses the Mark], ShelterForce, August 2018. | * Daniela Aiello, Lisa Bates, et al. [https://shelterforce.org/2018/08/22/eviction-lab-misses-the-mark/ Eviction Lab Misses the Mark], ShelterForce, August 2018. | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
Line 365: | Line 392: | ||
=== Week 10 === | === Week 10 === | ||
<!-- [[HCDS_(Fall_2018)/Day_10_plan|Day 10 plan]] --> | <!-- [[HCDS_(Fall_2018)/Day_10_plan|Day 10 plan]] --> | ||
<!-- [[:File:HCDS 2018 week 10 slides.pdf|Day 10 slides]] --> | <!-- [[:File:HCDS 2018 week 10 slides.pdf|Day 10 slides]] --> | ||
;User experience and big data: ''Design considerations for machine learning applications; human centered data visualization; data storytelling'' | ;User experience and big data: ''Design considerations for machine learning applications; human centered data visualization; data storytelling'' | ||
;Assignments due | ;Assignments due | ||
* | * Reading reflection | ||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
Line 378: | Line 407: | ||
;Homework assigned | ;Homework assigned | ||
* | * A5: Final presentation | ||
<!-- ;Resources --> | <!-- ;Resources --> | ||
<!-- *Fabien Girardin. ''[https://medium.com/@girardin/experience-design-in-the-machine-learning-era-e16c87f4f2e2 Experience design in the machine learning era].'' Medium, 2016. --> | <!-- *Fabien Girardin. ''[https://medium.com/@girardin/experience-design-in-the-machine-learning-era-e16c87f4f2e2 Experience design in the machine learning era].'' Medium, 2016. --> | ||
Line 392: | Line 422: | ||
<!-- * Megan Risdal, ''[http://blog.kaggle.com/2016/06/13/communicating-data-science-an-interview-with-a-storytelling-expert-tyler-byers/ Communicating data science: an interview with a storytelling expert].'' Kaggle blog, 2016. --> | <!-- * Megan Risdal, ''[http://blog.kaggle.com/2016/06/13/communicating-data-science-an-interview-with-a-storytelling-expert-tyler-byers/ Communicating data science: an interview with a storytelling expert].'' Kaggle blog, 2016. --> | ||
<!-- * Brent Dykes, ''[https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/ Data Storytelling: The Essential Data Science Skill Everyone Needs].'' Forbes, 2016. --> | <!-- * Brent Dykes, ''[https://www.forbes.com/sites/brentdykes/2016/03/31/data-storytelling-the-essential-data-science-skill-everyone-needs/ Data Storytelling: The Essential Data Science Skill Everyone Needs].'' Forbes, 2016. --> | ||
<br/> | <br/> | ||
<hr/> | <hr/> | ||
Line 403: | Line 434: | ||
;Assignments due | ;Assignments due | ||
* | * A5: Final presentation | ||
<!-- ;Agenda --> | <!-- ;Agenda --> | ||
<!-- {{:HCDS (Fall 2018)/Day 11 plan}} --> | <!-- {{:HCDS (Fall 2018)/Day 11 plan}} --> | ||
Line 411: | Line 444: | ||
;Homework assigned | ;Homework assigned | ||
* | * A6: Final project report (by 11:59pm) | ||
<!-- ;Resources --> | <!-- ;Resources --> | ||
Line 422: | Line 455: | ||
=== Week 12: Finals Week (No Class Session) === | === Week 12: Finals Week (No Class Session) === | ||
* NO CLASS | * NO CLASS | ||
* | * A6: FINAL PROJECT REPORT DUE BY 11:59PM | ||
<!-- * LATE PROJECT SUBMISSIONS NOT ACCEPTED. --> | <!-- * LATE PROJECT SUBMISSIONS NOT ACCEPTED. --> | ||
Line 430: | Line 463: | ||
== Assignments | == Assignments == | ||
Your grade in this class will be assigned through: | |||
* 9 Reading reflections (25%) | |||
* 6 Project assignments (50%) | |||
* Participation (25%) | |||
Assignments are comprised of weekly in-class activities, weekly reading reflections, written assignments, and programming/data analysis assignments. Weekly in-class reading groups will discuss the assigned readings from the course and students are expected to have read the material in advance. In class activities each week are posted to Canvas and may require time outside of class to complete. | Assignments are comprised of weekly in-class activities, weekly reading reflections, written assignments, and programming/data analysis assignments. Weekly in-class reading groups will discuss the assigned readings from the course and students are expected to have read the material in advance. In class activities each week are posted to Canvas and may require time outside of class to complete. | ||
Project Assignments 1 and 2 are extensions of exercies from the Community Data Science Workshop and will get you started on y Project | |||
Unless otherwise noted, all assignments are due before 5pm on the following week's class. | Unless otherwise noted, all assignments are due before 5pm on the following week's class. | ||
Line 459: | Line 501: | ||
=== Project Assignments === | === Project Assignments === | ||
This section provides basic descriptions of all scheduled course assignments | This section provides basic descriptions of all scheduled course assignments. | ||
=== A1: Project proposal and data aquisition === | === A1: Project proposal and data aquisition === | ||
Line 469: | Line 507: | ||
For this assignment you will propose a midterm project and use the skills you have learned in the CDSW to collect or present a dataset. You will turn in a one-page project description that | For this assignment you will propose a midterm project and use the skills you have learned in the CDSW to collect or present a dataset. You will turn in a one-page project description that | ||
* Identifies a dataset for analysis, and what makes it interesting to you. | |||
* Explains how the source of the data, how did you get it? | |||
* Describes 2-3 questions that you hope the data can help answer | |||
* Includes a table of summary statistics (minimum, maximum, median, and mean values) for variables in your dataset related to these questions | |||
I hope that you find a dataset related to your own interests, such as data from your workplace, community, or any other organization you may be involved in. If you have trouble finding a dataset related to your current interests, [[this page | HCDS (Fall 2017)/Datasets]] has examples of freely available datasets that you can use for this project. | |||
==== Rubric ==== | |||
'''Dataset identification:''' 25% | |||
'''Explaination of data source:''' 25% | |||
'''Example questions:''' 25% | |||
'''Summary statistics:''' 25% | |||
=== | === A2: Bias in data === | ||
The goal of this assignment is to explore the concept of bias through data on Wikipedia articles - specifically, articles on political figures from a variety of countries. For this assignment, you will combine a dataset of Wikipedia articles with a dataset of country populations, and use a machine learning service called ORES to estimate the quality of each article. | |||
You are expected to perform an analysis of how the ''coverage'' of politicians on Wikipedia and the ''quality'' of articles about politicians varies between countries. Your analysis will consist of a series of tables that show: | |||
# the countries with the greatest and least coverage of politicians on Wikipedia compared to their population. | |||
# the countries with the highest and lowest proportion of high quality articles about politicians. | |||
You are also expected to write a short reflection on the project, that describes how this assignment helps you understand the causes and consequences of bias on Wikipedia. | |||
'''A repository with a README framework and examples of querying the ORES datastore in R and Python can be found [https://github.com/Ironholds/data-512-a2 here]''' | |||
==== Getting the article and population data ==== | |||
The first step is getting the data, which lives in several different places. The wikipedia dataset can be found [https://figshare.com/articles/Untitled_Item/5513449 on Figshare]. Read through the documentation for this repository, then download and unzip it. | |||
The population data is on [https://www.dropbox.com/s/5u7sy1xt7g0oi2c/WPDS_2018_data.csv?dl=0 Dropbox]. Download this data as a CSV file (hint: look for the 'Microsoft Excel' icon in the upper right). | |||
=== | ==== Getting article quality predictions ==== | ||
Now you need to get the predicted quality scores for each article in the Wikipedia dataset. For this step, we're using a Wikimedia API endpoint for a machine learning system called [https://www.mediawiki.org/wiki/ORES ORES] ("Objective Revision Evaluation Service"). ORES estimates the quality of an article (at a particular point in time), and assigns a series of probabilities that the article is in one of 6 quality categories. The options are, from best to worst: | |||
# FA - Featured article | |||
# GA - Good article | |||
# B - B-class article | |||
# C - C-class article | |||
# Start - Start-class article | |||
# Stub - Stub-class article | |||
For context, these quality classes are a sub-set of quality assessment categories developed by Wikipedia editors. If you're curious, you can read more about what these assessment classes mean on [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_assessment#Grades English Wikipedia]. We will talk about what these categories mean, and how the ORES model predicts which category an article goes into, next week in class. For this assignment, you only need to know that these categories exist, and that ORES will assign one of these 6 categories to any article you send it. | |||
The ORES API is configured fairly similarly to the pageviews API we used last assignment; documentation can be found [https://ores.wikimedia.org/v3/#!/scoring/get_v3_scores_context_revid_model here]. It expects a revision ID, which is the third column in the Wikipedia dataset, and a model, which is "wp10". The [https://github.com/Ironholds/data-512-a2 sample iPython notebooks for this assignment] provide examples of a correctly-structured API query that you can use to understand how to gather your data, and also to examine the query output. | |||
In order to get article predictions for each article in the Wikipedia dataset, you will need to read <tt>page_data.csv</tt> into Python (or R), and then read through the dataset line by line, using the value of the <tt>last_edit</tt> column in the API query. If you're working in Python, the [https://docs.python.org/3/library/csv.html CSV module] will help with this. | |||
When you query the API, you will notice that ORES returns a <tt>prediction</tt> value that contains the name of one category, as well as <tt>probability</tt> values for each of the 6 quality categories. For this assignment, you only need to capture and use the value for <tt>prediction</tt>. We'll talk more about what the other values mean in class next week. | |||
=== | ==== Combining the datasets ==== | ||
Some processing of the data will be necessary! In particular, you'll need to - after retrieving and including the ORES data for each article - merge the wikipedia data and population data together. Both have fields containing country names for just that purpose. After merging the data, you'll invariably run into entries which ''cannot'' be merged. Either the population dataset does not have an entry for the equivalent Wikipedia country, or vice versa. You will need to remove the rows that do not have matching data. | |||
Consolidate the remaining data into a single CSV file which looks something like this: | |||
: | |||
{|class="wikitable" | |||
|- | |||
! Column | |||
|- | |||
|country | |||
|- | |||
|article_name | |||
|- | |||
|revision_id | |||
|- | |||
|article_quality | |||
|- | |||
|population | |||
|} | |||
Note: <tt>revision_id</tt> here is the same thing as <tt>last_edit</tt>, which you used to get scores from the ORES API. | |||
==== Analysis ==== | |||
Your analysis will consist of calculating the proportion (as a percentage) of articles-per-population and high-quality articles for each country. By "high quality" articles, in this case we mean the number of articles about politicians in a given country that ORES predicted would be in either the "FA" (featured article) or "GA" (good article) classes. | |||
Examples: | |||
* if a country has a population of 10,000 people, and you found 10 articles about politicians from that country, then the percentage of articles-per-population would be .1%. | |||
* if a country has 10 articles about politicians, and 2 of them are FA or GA class articles, then the percentage of high-quality articles would be 20%. | |||
==== | ==== Tables ==== | ||
: | The tables should be pretty straightforward. Produce four tables that show: | ||
#10 highest-ranked countries in terms of number of politician articles as a proportion of country population | |||
#10 lowest-ranked countries in terms of number of politician articles as a proportion of country population | |||
#10 highest-ranked countries in terms of number of GA and FA-quality articles as a proportion of all articles about politicians from that country | |||
#10 lowest-ranked countries in terms of number of GA and FA-quality articles as a proportion of all articles about politicians from that country | |||
== | Embed them in the iPython notebook. | ||
==== Writeup ==== | |||
Write a few paragraphs, either in the README or in the notebook, reflecting on what you have learned, what you found, what (if anything) surprised you about your findings, and/or what theories you have about why any biases might exist (if you find they exist). You can also include any questions this assignment raised for you about bias, Wikipedia, or machine learning. Particular questions you might want to answer: | |||
# What biases did you expect to find in the data, and why? | |||
# What are the results? | |||
# What theories do you have about why the results are what they are? | |||
==== Submission instructions ==== | |||
#Complete your Notebook and datasets in Jupyter Hub. | |||
#Create the data-512-a2 repository on GitHub w/ your code and data. | |||
#Complete and add your README and LICENSE file. | |||
#Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1244514/assignments/4376107 | |||
==== Required deliverables ==== | |||
A directory in your GitHub repository called <tt>data-512-a2</tt> that contains the following files: | |||
:# 1 final data file in CSV format that follows the formatting conventions. | |||
:# 1 Jupyter notebook named <tt>hcds-a2-bias</tt> that contains all code as well as information necessary to understand each programming step, as well as your writeup (if you have not included it in the README) and the tables. | |||
:# 1 README file in .txt or .md format that contains information to reproduce the analysis, including data descriptions, attributions and provenance information, and descriptions of all relevant resources and documentation (inside and outside the repo) and hyperlinks to those resources, and your writeup (if you have not included it in the notebook). A prototype framework is included in the [https://github.com/Ironholds/data-512-a2 sample repository] | |||
:# 1 LICENSE file that contains an [https://opensource.org/licenses/MIT MIT LICENSE] for your code. | |||
==== Helpful tips ==== | |||
* Read all instructions carefully before you begin | |||
* Read all API documentation carefully before you begin | |||
* Experiment with queries in the sandbox of the technical documentation for the API to familiarize yourself with the schema and the data | |||
* Explore the data a bit before starting to be sure you understand how it is structured and what it contains | |||
* Ask questions on Slack if you're unsure about anything. Please email Os to set up a meeting, or come to office hours, if you want to! This time is set aside specifically for you - it is not an imposition. | |||
* When documenting/describing your project, think: "If I found this GitHub repo, and wanted to fully reproduce the analysis, what information would I want? What information would I need?" | |||
=== A3: Crowdwork ethnography === | |||
For this assignment, you will go undercover as a member of the Amazon Mechanical Turk community. You will preview or perform Mechanical Turk tasks (called "HITs"), lurk in Turk worker discussion forums, and write an ethnographic account of your experience as a crowdworker, and how this experience changes your understanding of the phenomenon of crowdwork. | |||
The full assignment description is available [https://docs.google.com/document/d/16lZdTxkw1meUPMzA-BYl8TVtk0Jxv4Wh8mbZq_BursM/edit?usp=sharing as a Google doc] and [[:File:HCDS_Crowdwork_ethnography_instructions.pdf|as a PDF]]. | |||
=== A4: Final project plan === | |||
For this assignment, you will write up a study plan for your final class project. The plan will cover a variety of details about your final project. Identify the organization that you will work with, c data you will use, what you will do with the data (e.g. statistical analysis, train a model), what results you expect or intend, and most importantly, why your project is interesting or important (and to whom, besides yourself). | |||
=== | === A5: Final project presentation === | ||
For this assignment, you will give an in-class presentation of your final project. The goal of this assignment is to demonstrate that you are able to effectively communicate your research questions, methods, conclusions, and implications to your target audience. | |||
=== A6: Final project report === | |||
For this assignment, you will publish the complete code, data, and analysis of your final research project. The goal is to demonstrate that you can incorporate all of the human-centered design considerations you learned in this course and create research artifacts that are understandable, impactful, and reproducible. | |||
== Policies == | |||
The following general policies apply to this course. | |||
=== Attendance === | === Attendance === | ||
As detailed in [[ | As detailed in [[Teaching Assessment | my page on assessment]], attendance in class is expected of all participants. If you need to miss class for any reason, please contact a member of the teaching team ahead of time (email is best). Multiple unexplained absences will likely result in a lower grade or (in extreme circumstances) a failing grade. In the event of an absence, you are responsible for obtaining class notes, handouts, assignments, etc. | ||
=== Respect === | === Respect === | ||
Line 635: | Line 696: | ||
=== Disability and accommodations === | === Disability and accommodations === | ||
As part of ensuring that the class is as accessible as possible, the instructors are entirely comfortable with you using whatever form of note-taking method or recording is most comfortable to you, including laptops and audio recording devices. The instructors will do their best to ensure that all slides and scripts/notes are immediately available online after a lecture has concluded. In addition, if asked ahead of time we can try to record the audio of | As part of ensuring that the class is as accessible as possible, the instructors are entirely comfortable with you using whatever form of note-taking method or recording is most comfortable to you, including laptops and audio recording devices. The instructors will do their best to ensure that all slides and scripts/notes are immediately available online after a lecture has concluded. In addition, if asked ahead of time we can try to record the audio of individial lectures for students who have learning differences that make audiovisual notes preferable to written ones. | ||
If you require additional accommodations, please contact Disabled Student Services: 448 Schmitz, 206-543-8924 (V/TTY). If you have a letter from DSS indicating that you have a disability which requires academic accommodations, please present the letter to the instructors so we can discuss the accommodations you might need in the class. If you have any questions about this policy, reach out to the instructors directly. | If you require additional accommodations, please contact Disabled Student Services: 448 Schmitz, 206-543-8924 (V/TTY). If you have a letter from DSS indicating that you have a disability which requires academic accommodations, please present the letter to the instructors so we can discuss the accommodations you might need in the class. If you have any questions about this policy, reach out to the instructors directly. | ||
For more information on disability accommodations, and how to apply for one, please review [http://depts.washington.edu/uwdrs/current-students/accommodations/ UW's Disability Resources for Students]. | For more information on disability accommodations, and how to apply for one, please review [http://depts.washington.edu/uwdrs/current-students/accommodations/ UW's Disability Resources for Students]. | ||
=== Assignments and coursework === | |||
Grades will be determined as follows: | |||
* 20% Participation | |||
* 20% Reading reflections | |||
* 20% Midterm project | |||
* 40% Final project | |||
You are expected to produce work in all of the assignments that reflects the highest standards of professionalism. For written documents, this means proper spelling, grammar, and formatting. | |||
Late assignments will not be accepted; if your assignment is late, you will receive a zero score. Again, if you run into an issue that necessitates an extension, please reach out. | |||
[[Category:Groceryheist drafts]] | |||
[[Category:Groceryheist drafts]] | [[Category:Groceryheist drafts]] |