Editing Human Centered Data Science (Fall 2018)/Assignments

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
<noinclude>
<div style="font-family:Rockwell,'Courier Bold',Courier,Georgia,'Times New Roman',Times,serif; min-width:10em;">
<div style="float:left; width:100%; margin-right:2%;">
{{Link/Graphic/Main/2
|highlight color= 27666b
|color=460c40
|link=
|image=
|text-align=left
|top font-size= 1.1em
|top color=FFF
|line color=FFF
|top text=This page is a work in progress.
|bottom font-size= 1em
|bottom color= FFF
|bottom text=
|line= none
}}</div></div>
</noinclude>


__FORCETOC__
__FORCETOC__
Line 29: Line 10:
=== Assignment timeline ===
=== Assignment timeline ===
;Assignments due every week
;Assignments due every week
* '''In-class activities - 2 points''' (weekly): In-class activity output posted to Canvas (group or individual) within 24 hours of class session.
* '''In-class activities - 2 points''' (weekly): In-class activity output posted to Canvas (group or individual)
* '''Reading reflections - 2 points''' (weekly): Reading reflections posted to Canvas (individual) before following class session.
* '''Reading reflections - 2 points''' (weekly): Reading reflections posted to Canvas (individual)




;Scheduled assignments
;Scheduled assignments
* '''A1 - 5 points''' (due 10/18): Data curation (programming/analysis)
* '''A1 - 5 points''' (due Week 4): Data curation (programming/analysis)
* '''A2 - 10 points''' (due 11/1): Sources of bias in data (programming/analysis)
* '''A2 - 10 points''' (due Week 6): Sources of bias in data (programming/analysis)
* '''A3  - 10 points''' (due 11/8): Crowdwork Ethnography (written)
* '''A3  - 10 points''' (due Week 7): Final project plan (written)
* '''A4 - 10 points''' (due 11/22): Final project plan (written)
* '''A4 - 10 points''' (due Week 9): Crowdwork self-ethnography (written)
* '''A5 - 10 points''' (due 12/6): Final project presentation (oral, slides)
* '''A5 - 10 points''' (due Week 11): Final project presentation (oral, written)
* '''A6 - 15 points''' (due 12/9): Final project report (programming/analysis, written)
* '''A6 - 15 points''' (due by 11:59pm on Sunday, December 10): Final project report (programming/analysis, written)


[[Human Centered Data Science (Fall 2018)/Assignments|more information...]]
[[Human Centered Data Science (Fall 2018)/Assignments|more information...]]
Line 45: Line 26:


=== Weekly in-class activities ===
=== Weekly in-class activities ===
In each class session, one in-class activity will have a graded deliverable that is due the next day. The sum of these deliverables constitutes your participation grade for the course. The deliverable is intended to be something that you complete (and ideally, turn in, in class), but in rare cases may involve some work after class. It could be as simple as a picture of a design sketch you made, or notes from a group brainstorm. When you and/or your group complete the assigned activity, follow the instructions below to submit the activity and get full credit.  
Love it or hate it, teamwork is an integral part of data science practice (and work in general). During each class session, you will be asked to participate in one or more group activities. These activities may involve reading discussions, group brainstorming activities, collaborative coding or data analysis, working together on designs, or offering peer support.


Love it or hate it, teamwork is an integral part of data science practice (and work in general). During some class sessions, you will be asked to participate in one or more group activities. These activities may involve reading discussions, group brainstorming activities, collaborative coding or data analysis, working together on designs, or offering peer support.
In each class session, one in-class activity will have a graded deliverable that is due the next day. The sum of these deliverables constitutes your participation grade for the course. The deliverable is intended to be something that you complete (and ideally, turn in, in class), but in rare cases may involve some work after class. It could be as simple as a picture of a design sketch you made, or notes from a group brainstorm. When you and your group complete the assigned activity, follow the instructions below to submit the activity and get full credit.  


;Instructions (individual activity)
;Instructions
# Do the in-class activity
# Do the in-class activity
# Submit the deliverable via Canvas, in the format specified by the instructor within 24 hours of class
# Submit the deliverable via Canvas, in the format specified by the instructor within 24 hours of class
# If it is a group assignment:
:*Choose one group member to submit the deliverable for the whole group
:*'''Make sure to list the full names of all group members in the Canvas post'''


;Instructions (group activity)
Late deliverables will never be accepted, and everyone in the group will lose points. So make sure you choose someone reliable to turn the assignment in!
# Do the in-class activity
# Before the end of class, choose one group member to submit the deliverable for the whole group
# The designated group member will submit the deliverable via Canvas, in the format specified by the instructor within 24 hours of class
::*'''''Note: Make sure to list the full names of all group members in the Canvas post!'''''
 
Late deliverables will never be accepted, and in the case of group activities, everyone in the group will lose points. So make sure you choose someone reliable to turn the assignment in!


=== Weekly reading reflections ===
=== Weekly reading reflections ===
This course will introduce you to cutting edge research and opinion from major thinkers in the domain of human centered data science. By reading and writing about this material, you will have an opportunity to explore the complex intersections of technology, methodology, ethics, and social thought that characterize this budding field of research and practice.  
This course will introduce you to cutting edge research and opinion from major thinkers in the domain of human centered data science. By reading and writing about this material, you will have an opportunity to explore the complex intersections of technology, methodology, ethics, and social thought that characterize this budding field of research and practice. As a participant in the course, you are responsible for intellectually engaging with ''all assigned readings'' and developing an understanding of the ideas discussed in them.


As a participant in the course, you are responsible for intellectually engaging with ''all assigned readings'' and developing an understanding of the ideas discussed in them.
This assignment is designed to encourage you to reflect on these readings (or in some cases, viewings or listenings) and make connections during our class discussions. To this end, you will be responsible for posting reading reflections every week of the quarter (except for week 1).


The weekly reading reflections assignment is designed to encourage you to reflect on these works and make connections during our class discussions. To this end, you will be responsible for posting reflections on the previous week's assigned reading before the next class session.
There will generally be multiple readings assigned each week. You are responsible for reading ''all of them.'' However, you only need to write a reflection on '''one reading per week.''' Unless your instructor specifies otherwise, you can choose which reading you would like to write your reflection about.  


There will generally be multiple readings assigned each week. You are responsible for reading ''all of them.'' However, you only need to write a reflection on '''one reading per week.''' Unless your instructor specifies otherwise, you can choose which reading you would like to reflect on.
These reflections are meant to be brief but meaningful. Follow the instructions below, demonstrate that you engaged with the material, and turn the reflection in on time, and you will receive full credit. Late reading reflections will never be accepted.
 
These reflections are meant to be succinct but meaningful. Follow the instructions below, demonstrate that you engaged with the material, and turn the reflection in on time, and you will receive full credit. Late reading reflections will never be accepted.


;Instructions
;Instructions
Line 76: Line 52:
# Select a reading to reflect on.
# Select a reading to reflect on.
# In at least 2-3 full sentences, answer the question "How does this reading inform your understanding of human centered data science?"
# In at least 2-3 full sentences, answer the question "How does this reading inform your understanding of human centered data science?"
# Using full sentences, list ''at least 1 question'' that this reading raised in your mind, and say ''why'' the reading caused you to ask this question.
# Using full sentences, list ''at least 1 question'' that this reading raised in your mind.
# Post your reflection to Canvas before the next class session.
# Post your reflection to Canvas before the next class session.


You are encouraged, but not required, to make connections between different readings (from the current week, from previous weeks, or other relevant material you've read/listened to/watched) in your reflections.
You are encouraged, but not required, to make connections between different readings (from the current week, or previous weeks) in your reflections.


== Scheduled assignments ==
== Scheduled assignments ==
Line 85: Line 61:


=== A1: Data curation ===
=== A1: Data curation ===
[[File:En-wikipedia_traffic_200801-201709_thompson.png|300px|thumb|Your assignment is to create a graph that looks a lot like this one, starting from scratch, and following best practices for reproducible research.]]
The goal of this assignment is to construct, analyze, and publish a dataset of monthly traffic on English Wikipedia from January 1 2008 through September 30 2017.  
 
The goal of this assignment is to construct, analyze, and publish a dataset of monthly traffic on English Wikipedia from January 1 2008 through September 30 2018. All analysis should be performed in a single Jupyter notebook and all data, documentation, and code should be published in a single GitHub repository.


The purpose of the assignment is to demonstrate that you can follow best practices for open scientific research in designing and implementing your project, and make your project fully reproducible by others: from data collection to data analysis.
The purpose of the assignment is to demonstrate that you can follow best practices for open scientific research in designing and implementing your project, and make your project fully reproducible by others: from data collection to data analysis.


For this assignment, you combine data about Wikipedia page traffic from two different [https://www.mediawiki.org/wiki/REST_API Wikimedia REST API] endpoints into a single dataset, perform some simple data processing steps on the data, and then analyze that data.
For this assignment, you combine data Wikipedia traffic from two different [https://www.mediawiki.org/wiki/REST_API Wikimedia REST API] endpoints into a single dataset, perform some simple data processing steps on the data, and then analyze that data.  
 
==== Step 0: Read about reproducibility ====
Read Chapter 2 [https://www.practicereproducibleresearch.org/core-chapters/2-assessment.html "Assessing Reproducibility"] and Chapter 3 [https://www.practicereproducibleresearch.org/core-chapters/3-basic.html "The Basic Reproducible Workflow Template"] from ''The Practice of Reproducible Research'' University of California Press, 2018.  


==== Step 1: Data acquisition ====
==== Step 1: Data acquisition ====
In order to measure Wikipedia traffic from 2008-2018, you will need to collect data from two different API endpoints, the Legacy Pagecounts API and the Pageviews API.
In order to measure Wikipedia traffic from 2008-2016, you will need to collect data from two different API endpoints, the Pagecounts API and the Pageviews API.
 
# The '''Legacy Pagecounts API''' ([https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts documentation], [https://wikimedia.org/api/rest_v1/#!/Pagecounts_data_(legacy)/get_metrics_legacy_pagecounts_aggregate_project_access_site_granularity_start_end endpoint]) provides access to desktop and mobile traffic data from December 2007 through July 2016.
#The '''Pageviews API''' ([https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews documentation], [https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_aggregate_project_access_agent_granularity_start_end endpoint]) provides access to desktop, mobile web, and mobile app traffic data from July 2015 through last month.


For each API, you will need to collect data ''for all months where data is avaiable'' and then save the raw results into 5 separate JSON source data files (one file per API query type) before continuing to step 2.
# The legacy '''Pagecounts API''' ([https://wikitech.wikimedia.org/wiki/Analytics/AQS/Legacy_Pagecounts documentation], [https://wikimedia.org/api/rest_v1/#!/Pagecounts_data_(legacy)/get_metrics_legacy_pagecounts_aggregate_project_access_site_granularity_start_end endpoint]) provides access to desktop and mobile traffic data from January 2008 through July 2016.
#The '''Pageviews API''' ([https://wikitech.wikimedia.org/wiki/Analytics/AQS/Pageviews documentation], [https://wikimedia.org/api/rest_v1/#!/Pageviews_data/get_metrics_pageviews_aggregate_project_access_agent_granularity_start_end endpoint]) provides access to desktop, mobile web, and mobile app traffic data from July 2015 through September 2017.


To get you started, you can refer to this example Notebook that contains sample code for API calls ([http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb view the notebook], [http://paws-public.wmflabs.org/paws-public/User:Jtmorgan/data512_a1_example.ipynb?format=raw download the notebook]). This sample code is [https://creativecommons.org/share-your-work/public-domain/cc0/ licensed CC0] so feel free to re-use any of the code in that notebook without attribution.
You will need to collect data ''for all months'' from both APIs in a Jupyter Notebook and then save the raw results into 5 separate JSON source data files (one file per API query) before continuing to step 2.


Your JSON-formatted source data file must contain the complete and un-edited output of your API queries. The naming convention for the source data files is:  
Your JSON-formatted source data file must contain the complete and un-edited output of your API queries.The naming convention for the source data files is:  
  apiname_accesstype_firstmonth-lastmonth.json
  apiname_accesstype_firstmonth-lastmonth.json


For example, your filename for monthly page views on desktop should be:
For example, your filename for monthly page views on desktop should be:
  pagecounts_desktop-site_200712-201809.json
  pagecounts_desktop-site_200801-201607.json


'''Important notes:'''
'''Important notes:'''
# As much as possible, we're interested in ''organic'' (user) traffic, as opposed to traffic by web crawlers or spiders. The Pageview API (but not the Pagecount API) allows you to filter by <tt>agent=user</tt>. You should do that.
# As much as possible, we're interested in ''organic'' (user) traffic, as opposed to traffic by web crawlers or spiders. The Pageview API (but not the Pagecount API) allows you to filter by <tt>agent=user</tt>. You should do that.
# There was about 1 year of overlapping traffic data between the two APIs. You need to gather, and later graph, data from both APIs for this period of time.
#2 There is a ~13 month period in which both APIs provide traffic data. You need to gather, and later graph, data from both APIs for this period of time.


==== Step 2: Data processing ====
==== Step 2: Data processing ====
Line 156: Line 125:


The final data file should be named:  
The final data file should be named:  
  en-wikipedia_traffic_200712-201809.csv
  en-wikipedia_traffic_200801-201709.csv


==== Step 3: Analysis ====
==== Step 3: Analysis ====
<!-- [[File:PlotPageviewsEN_overlap.png|200px|thumb|A sample visualization of pageview traffic data.]] -->
[[File:PlotPageviewsEN_overlap.png|200px|thumb|A sample visualization of pageview traffic data.]]
For this assignment, the "analysis" will be fairly straightforward: you will visualize the dataset you have created as a time series graph.  
For this assignment, the "analysis" will be fairly straightforward: you will visualize the dataset you have created as a time series graph.  


Your visualization will track three traffic metrics: mobile traffic, desktop traffic, and all traffic (mobile + desktop).
Your visualization will track three traffic metrics: mobile traffic, desktop traffic, and all traffic (mobile + desktop).


<!-- Your visualization should look similar to the example graph above, which is based on the same data you'll be using! The only big difference should be that your mobile traffic data will only go back to October 2014, since the API does not provide monthly traffic data going back to 2010. -->
Your visualization should look similar to the example graph above, which is based on the same data you'll be using! The only big difference should be that your mobile traffic data will only go back to October 2014, since the API does not provide monthly traffic data going back to 2010.


In order to complete the analysis correctly and receive full credit, your graph will need to be the right scale to view the data; all units, axes, and values should be clearly labeled; and the graph should possess a key and a title. You must also generate a .png or .jpeg formatted image of your final graph.  
In order to complete the analysis correctly and receive full credit, your graph will need to be the right scale to view the data; all units, axes, and values should be clearly labeled; and the graph should possess a key and a title. You must also generate a .png or .jpeg formatted image of your final graph.  


You should graph the data in Python or R, in your notebook.  
You may choose to graph the data in Python, in your notebook. If you decide to use Google Sheet or some other open, public data visualization platform to build your graph, link to it in the README, and make sure sharing settings allow anyone who clicks on the link to view the graph and download the data!
 
<!-- If you decide to use Google Sheet or some other open, public data visualization platform to build your graph, link to it in the README, and make sure sharing settings allow anyone who clicks on the link to view the graph and download the data! -->


==== Step 4: Documentation ====
==== Step 4: Documentation ====
Follow best practices for documenting your project, as outlined in the Week 3 slides and in Chapter 2 [https://www.practicereproducibleresearch.org/core-chapters/2-assessment.html "Assessing Reproducibility"] and Chapter 3 [https://www.practicereproducibleresearch.org/core-chapters/3-basic.html "The Basic Reproducible Workflow Template"] from ''The Practice of Reproducible Research''.
Follow best practices for documenting your project, as outlined in the Week 3 slides (LINK). Your documentation will be done in your Jupyter Notebook, a README file, and a LICENSE file.
 
Your documentation will be done in your Jupyter Notebook, a README file, and a LICENSE file.


At minimum, your Jupyter Notebook should:
At minimum, your Jupyter Notebook should:
Line 182: Line 147:
At minimum, you README file should
At minimum, you README file should
* Describe the goal of the project.
* Describe the goal of the project.
* List the license of the source data and a link to the Wikimedia Foundation REST API terms of use: https://www.mediawiki.org/wiki/REST_API#Terms_and_conditions
* List the license of the source data and a link to the Wikimedia Foundation terms of use (LINK)
* Link to all relevant API documentation
* Link to all relevant API documentation
* Describe the values of all fields in your final data file.
* Describe the values of all fields in your final data file.
Line 188: Line 153:


==== Submission instructions ====
==== Submission instructions ====
#Complete you Notebook and datasets in Jupyter Hub.
#Download the data-512-a1 directory from Jupyter Hub.
#Create the data-512-a1 repository on GitHub w/ your code and data.
#Create the data-512-a1 repository on GitHub w/ your code and data.
#Complete and add your README and LICENSE file.
#Complete and add your README and LICENSE file.
#Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1244514/assignments/4376106
#Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1174178/assignments/3876066


==== Required deliverables ====
==== Required deliverables ====
Line 209: Line 176:


=== A2: Bias in data ===
=== A2: Bias in data ===
The goal of this assignment is to explore the concept of bias through data on Wikipedia articles - specifically, articles on political figures from a variety of countries. For this assignment, you will combine a dataset of Wikipedia articles with a dataset of country populations, and use a machine learning service called ORES to estimate the quality of each article.
 
The goal of this assignment is to explore the concept of 'bias' through data on Wikipedia articles - specifically, articles on political figures from a variety of countries. For this assignment, you will combine a dataset of Wikipedia articles with a dataset of country populations, and use a machine learning service called ORES to estimate the quality of each article.


You are expected to perform an analysis of how the ''coverage'' of politicians on Wikipedia and the ''quality'' of articles about politicians varies between countries. Your analysis will consist of a series of tables that show:
You are expected to perform an analysis of how the ''coverage'' of politicians on Wikipedia and the ''quality'' of articles about politicians varies between countries. Your analysis will consist of a series of tables that show:
Line 216: Line 184:


You are also expected to write a short reflection on the project, that describes how this assignment helps you understand the causes and consequences of bias on Wikipedia.
You are also expected to write a short reflection on the project, that describes how this assignment helps you understand the causes and consequences of bias on Wikipedia.
'''A repository with a README framework and examples of querying the ORES datastore in R and Python can be found [https://github.com/Ironholds/data-512-a2 here]'''


==== Getting the article and population data ====
==== Getting the article and population data ====
Line 223: Line 189:
The first step is getting the data, which lives in several different places. The wikipedia dataset can be found [https://figshare.com/articles/Untitled_Item/5513449 on Figshare]. Read through the documentation for this repository, then download and unzip it.  
The first step is getting the data, which lives in several different places. The wikipedia dataset can be found [https://figshare.com/articles/Untitled_Item/5513449 on Figshare]. Read through the documentation for this repository, then download and unzip it.  


The population data is on [https://www.dropbox.com/s/5u7sy1xt7g0oi2c/WPDS_2018_data.csv?dl=0 Dropbox]. Download this data as a CSV file (hint: look for the 'Microsoft Excel' icon in the upper right).
The population data is on the [http://www.prb.org/DataFinder/Topic/Rankings.aspx?ind=14 Population Research Bureau website]. Download this data as a CSV file (hint: look for the 'Microsoft Excel' icon in the upper right).  


==== Getting article quality predictions ====
==== Getting article quality predictions ====
Line 238: Line 204:
For context, these quality classes are a sub-set of quality assessment categories developed by Wikipedia editors. If you're curious, you can read more about what these assessment classes mean on [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_assessment#Grades English Wikipedia]. We will talk about what these categories mean, and how the ORES model predicts which category an article goes into, next week in class. For this assignment, you only need to know that these categories exist, and that ORES will assign one of these 6 categories to any article you send it.
For context, these quality classes are a sub-set of quality assessment categories developed by Wikipedia editors. If you're curious, you can read more about what these assessment classes mean on [https://en.wikipedia.org/wiki/Wikipedia:WikiProject_assessment#Grades English Wikipedia]. We will talk about what these categories mean, and how the ORES model predicts which category an article goes into, next week in class. For this assignment, you only need to know that these categories exist, and that ORES will assign one of these 6 categories to any article you send it.


The ORES API is configured fairly similarly to the pageviews API we used last assignment; documentation can be found [https://ores.wikimedia.org/v3/#!/scoring/get_v3_scores_context_revid_model here]. It expects a revision ID, which is the third column in the Wikipedia dataset, and a model, which is "wp10". The [https://github.com/Ironholds/data-512-a2 sample iPython notebooks for this assignment] provide examples of a correctly-structured API query that you can use to understand how to gather your data, and also to examine the query output.
The ORES API is configured fairly similarly to the pageviews API we used last assignment; documentation can be found [https://ores.wikimedia.org/v3/#!/scoring/get_v3_scores_context_revid_model here]. It expects a revision ID, which is the third column in the Wikipedia dataset, and a model, which is "wp10". The sample iPython notebook for this assignment provides an example of a correctly-structured API query that you can use to understand how to gather your data, and also to examine the query output.


In order to get article predictions for each article in the Wikipedia dataset, you will need to read <tt>page_data.csv</tt> into Python (or R), and then read through the dataset line by line, using the value of the <tt>last_edit</tt> column in the API query. If you're working in Python, the [https://docs.python.org/3/library/csv.html CSV module] will help with this.
In order to get article predictions for each article in the Wikipedia dataset, you will need to read <tt>page_data.csv</tt> into Python (or R), and then read through the dataset line by line, using the value of the <tt>last_edit</tt> column in the API query. If you're working in Python, the [https://docs.python.org/3/library/csv.html CSV module] will help with this.
Line 285: Line 251:


==== Writeup ====
==== Writeup ====
Write a few paragraphs, either in the README or in the notebook, reflecting on what you have learned, what you found, what (if anything) surprised you about your findings, and/or what theories you have about why any biases might exist (if you find they exist). You can also include any questions this assignment raised for you about bias, Wikipedia, or machine learning. Particular questions you might want to answer:
Write a few paragraphs, either in the README or in the notebook, reflecting on what you have learned, what you found, what (if anything) surprised you about your findings, and/or what theories you have about why any biases might exist (if you find they exist). You can also include any questions this assignment raised for you about bias, Wikipedia, or machine learning.
 
# What biases did you expect to find in the data, and why?
# What are the results?
# What theories do you have about why the results are what they are?


==== Submission instructions ====
==== Submission instructions ====
Line 295: Line 257:
#Create the data-512-a2 repository on GitHub w/ your code and data.
#Create the data-512-a2 repository on GitHub w/ your code and data.
#Complete and add your README and LICENSE file.
#Complete and add your README and LICENSE file.
#Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1244514/assignments/4376107
#Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1174178/assignments/3876068


==== Required deliverables ====
==== Required deliverables ====
Line 301: Line 263:
:# 1 final data file in CSV format that follows the formatting conventions.
:# 1 final data file in CSV format that follows the formatting conventions.
:# 1 Jupyter notebook named <tt>hcds-a2-bias</tt> that contains all code as well as information necessary to understand each programming step, as well as your writeup (if you have not included it in the README) and the tables.
:# 1 Jupyter notebook named <tt>hcds-a2-bias</tt> that contains all code as well as information necessary to understand each programming step, as well as your writeup (if you have not included it in the README) and the tables.
:# 1 README file in .txt or .md format that contains information to reproduce the analysis, including data descriptions, attributions and provenance information, and descriptions of all relevant resources and documentation (inside and outside the repo) and hyperlinks to those resources, and your writeup (if you have not included it in the notebook). A prototype framework is included in the [https://github.com/Ironholds/data-512-a2 sample repository]
:# 1 README file in .txt or .md format that contains information to reproduce the analysis, including data descriptions, attributions and provenance information, and descriptions of all relevant resources and documentation (inside and outside the repo) and hyperlinks to those resources, and your writeup (if you have not included it in the notebook).
:# 1 LICENSE file that contains an [https://opensource.org/licenses/MIT MIT LICENSE] for your code.
:# 1 LICENSE file that contains an [https://opensource.org/licenses/MIT MIT LICENSE] for your code.


Line 309: Line 271:
* Experiment with queries in the sandbox of the technical documentation for the API to familiarize yourself with the schema and the data
* Experiment with queries in the sandbox of the technical documentation for the API to familiarize yourself with the schema and the data
* Explore the data a bit before starting to be sure you understand how it is structured and what it contains
* Explore the data a bit before starting to be sure you understand how it is structured and what it contains
* Ask questions on Slack if you're unsure about anything. Please email Os to set up a meeting, or come to office hours, if you want to! This time is set aside specifically for you - it is not an imposition.
* Ask questions on Slack if you're unsure about anything
* When documenting/describing your project, think: "If I found this GitHub repo, and wanted to fully reproduce the analysis, what information would I want? What information would I need?"
* When documenting/describing your project, think: "If I found this GitHub repo, and wanted to fully reproduce the analysis, what information would I want? What information would I need?"


=== A3: Crowdwork ethnography ===
=== A3: Final project plan ===
For this assignment, you will go undercover as a member of the Amazon Mechanical Turk community. You will preview or perform Mechanical Turk tasks (called "HITs"), lurk in Turk worker discussion forums, and write an ethnographic account of your experience as a crowdworker, and how this experience changes your understanding of the phenomenon of crowdwork.
 
The full assignment description is available [https://docs.google.com/document/d/16lZdTxkw1meUPMzA-BYl8TVtk0Jxv4Wh8mbZq_BursM/edit?usp=sharing as a Google doc] and [[:File:HCDS_Crowdwork_ethnography_instructions.pdf|as a PDF]].
 
=== A4: Final project plan ===
''For examples of datasets you may want to use for your final project, see [[HCDS_(Fall_2017)/Datasets]].''
''For examples of datasets you may want to use for your final project, see [[HCDS_(Fall_2017)/Datasets]].''


For this assignment, you will write up a study plan for your final class project. The plan will cover a variety of details about your final project, including what data you will use, what you will do with the data (e.g. statistical analysis, train a model), what results you expect or intend, and most importantly, why your project is interesting or important (and to whom, besides yourself).
For this assignment, you will write up a study plan for your final class project. The plan will cover a variety of details about your final project, including what data you will use, what you will do with the data (e.g. statistical analysis, train a model), what results you expect or intend, and most importantly, why your project is interesting or important (and to whom, besides yourself).


=== A4: Crowdwork ethnography ===
For this assignment, you will go undercover as a member of the Amazon Mechanical Turk community. You will preview or perform Mechanical Turk tasks (called "HITs"), lurk in Turk worker discussion forums, and write an ethnographic account of your experience as a crowdworker, and how this experience changes your understanding of the phenomenon of crowdwork.
The full assignment description is available in PDF form [[:File:HCDS_A4_Crowdwork_ethnography.pdf|here]].


=== A5: Final project presentation ===
=== A5: Final project presentation ===
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)