Editing HCDS (Fall 2017)/Assignments
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
<noinclude> | |||
<div style="font-family:Rockwell,'Courier Bold',Courier,Georgia,'Times New Roman',Times,serif; min-width:10em;"> | |||
<div style="float:left; width:100%; margin-right:2%;"> | |||
{{Link/Graphic/Main/2 | |||
|highlight color= 27666b | |||
|color=460c40 | |||
|link= | |||
|image= | |||
|text-align=left | |||
|top font-size= 1.1em | |||
|top color=FFF | |||
|line color=FFF | |||
|top text=This page is a work in progress. | |||
|bottom font-size= 1em | |||
|bottom color= FFF | |||
|bottom text= | |||
|line= none | |||
}}</div></div> | |||
</noinclude> | |||
__FORCETOC__ | __FORCETOC__ | ||
Line 16: | Line 35: | ||
;Scheduled assignments | ;Scheduled assignments | ||
* '''A1 - 5 points''' (due Week 4): Data curation (programming/analysis) | * '''A1 - 5 points''' (due Week 4): Data curation (programming/analysis) | ||
* '''A2 - 10 points''' (due Week | * '''A2 - 10 points''' (due Week 5): Sources of bias in data (programming/analysis) | ||
* '''A3 - 10 points''' (due Week 7): Final project plan (written) | * '''A3 - 10 points''' (due Week 7): Final project plan (written) | ||
* '''A4 - 10 points''' (due Week 9): Crowdwork self-ethnography (written) | * '''A4 - 10 points''' (due Week 9): Crowdwork self-ethnography (written) | ||
Line 176: | Line 195: | ||
=== A2: Bias in data === | === A2: Bias in data === | ||
The goal of this assignment is to explore the concept of 'bias' through data on Wikipedia articles - specifically, articles on political figures from a variety of countries. You are expected to perform an analysis of how article quality (and article ''existence'') varies between countries, and report back with visualisations and your thoughts on what the exercise has taught you about bias (and/or Wikipedia!) | |||
For this assignment, you will combine a dataset of Wikipedia data with a dataset of population data, and use the Wikipedia 'ORES' system to gauge the quality of each article. | |||
==== Data acquisition ==== | |||
The first step is getting the data, which lives in several different places. The wikipedia dataset can be found [https://figshare.com/articles/Untitled_Item/5513449 on Figshare]; the population data is on the [http://www.prb.org/DataFinder/Topic/Rankings.aspx?ind=14 Population Research Bureau website]. | |||
To extract the ORES data, you will need to use their API, which is configured fairly similarly to the pageviews API we used last assignment; documentation can be found [https://ores.wikimedia.org/v3/#!/scoring/get_v3_scores_context_revid_model here]. It expects a revision ID, which is the third column in the Wikimedia dataset, and a model, which is "wp10". | |||
This model provides the predicted quality of the article; options are, from best to worst: | |||
* FA | |||
* GA | |||
* A | |||
* B | |||
* C | |||
* Start | |||
* Stub | |||
=== | === Data processing === | ||
Some processing of the data will be necessary! In particular, you'll need to - after retrieving and including the ORES data for each article - merge the wikipedia data and population data together. Both have fields containing country names for just that purpose. After merging the data, you'll invariably run into entries which ''cannot'' be merged. Either the population dataset does not have an entry for the equivalent Wikipedia country, or vice versa. You will need to remove the rows that do not have matching data. | Some processing of the data will be necessary! In particular, you'll need to - after retrieving and including the ORES data for each article - merge the wikipedia data and population data together. Both have fields containing country names for just that purpose. After merging the data, you'll invariably run into entries which ''cannot'' be merged. Either the population dataset does not have an entry for the equivalent Wikipedia country, or vice versa. You will need to remove the rows that do not have matching data. | ||
Consolidate the | Consolidate the data into a single CSV file which looks something like this: | ||
Line 232: | Line 237: | ||
|} | |} | ||
=== Analysis === | |||
The analysis should be pretty straightforward. Produce two visualisations which explore: | |||
# How article quality varies between countries; | |||
# How the number of articles a country has, when considering its population, varies between countries. | |||
In order to complete the analysis correctly and receive full credit, your graphs will need to be the right scale to view the data; all units, axes, and values should be clearly labeled; and the graph should possess a key and a title. You must also generate a .png or .jpeg formatted image of your final graphs. | |||
You may choose to graph the data in Python, in your notebook. If you decide to use Google Sheet or some other open, public data visualization platform to build your graphs, link to them in the README, and make sure sharing settings allow anyone who clicks on the links to view the graphs and download the data! | |||
=== Writeup === | |||
Write a few paragraphs, either in the README or in the notebook, | Write a few paragraphs, either in the README or in the notebook, explaining your work and communicating what you have learned - about bias, or about Wikipedia - and what theories you have about why any biases might exist (if you find they exist). | ||
==== Submission instructions ==== | ==== Submission instructions ==== | ||
Line 257: | Line 255: | ||
#Create the data-512-a2 repository on GitHub w/ your code and data. | #Create the data-512-a2 repository on GitHub w/ your code and data. | ||
#Complete and add your README and LICENSE file. | #Complete and add your README and LICENSE file. | ||
#Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1174178/assignments/ | #Submit the link to your GitHub repo to: https://canvas.uw.edu/courses/1174178/assignments/3876066 | ||
==== Required deliverables ==== | ==== Required deliverables ==== | ||
A directory in your GitHub repository called <tt>data-512-a2</tt> that contains the following files: | A directory in your GitHub repository called <tt>data-512-a2</tt> that contains the following files: | ||
:# 1 final data file in CSV format that follows the formatting conventions. | :# 1 final data file in CSV format that follows the formatting conventions. | ||
:# 1 Jupyter notebook named <tt>hcds-a2-bias</tt> that contains all code as well as information necessary to understand each programming step, as well as your writeup (if you have not included it in the README) | :# 1 Jupyter notebook named <tt>hcds-a2-bias</tt> that contains all code as well as information necessary to understand each programming step, as well as your writeup (if you have not included it in the README). | ||
:# 1 README file in .txt or .md format that contains information to reproduce the analysis, including data descriptions, attributions and provenance information, and descriptions of all relevant resources and documentation (inside and outside the repo) and hyperlinks to those resources, and your writeup (if you have not included it in the notebook). | :# 1 README file in .txt or .md format that contains information to reproduce the analysis, including data descriptions, attributions and provenance information, and descriptions of all relevant resources and documentation (inside and outside the repo) and hyperlinks to those resources, and your writeup (if you have not included it in the notebook). | ||
:# 1 LICENSE file that contains an [https://opensource.org/licenses/MIT MIT LICENSE] for your code. | :# 1 LICENSE file that contains an [https://opensource.org/licenses/MIT MIT LICENSE] for your code. | ||
:# 1 .png or .jpeg image of your visualization. | |||
==== Helpful tips ==== | ==== Helpful tips ==== | ||
Line 275: | Line 274: | ||
=== A3: Final project plan === | === A3: Final project plan === | ||
For this assignment, you will write up a study plan for your final class project. The plan will cover a variety of details about your final project, including what data you will use, what you will do with the data (e.g. statistical analysis, train a model), what results you expect or intend, and most importantly, why your project is interesting or important (and to whom, besides yourself). | For this assignment, you will write up a study plan for your final class project. The plan will cover a variety of details about your final project, including what data you will use, what you will do with the data (e.g. statistical analysis, train a model), what results you expect or intend, and most importantly, why your project is interesting or important (and to whom, besides yourself). | ||
=== A4: Crowdwork ethnography === | === A4: Crowdwork self-ethnography === | ||
For this assignment, you will go undercover as a member of the Amazon Mechanical Turk community. You will | For this assignment, you will go undercover as a member of the Amazon Mechanical Turk community. You will perform assigned tasks, participate (or lurk) in Turker discussion forums, and write an ethnographic account of your experience as a human-in-the-loop of data science. | ||
=== A5: Final project presentation === | === A5: Final project presentation === |