Editing DS4UX (Spring 2016)/Panama Papers

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 3: Line 3:
In this project, we will explore a few ways to gather data using two Wikipedia APIs: one provides data related to edits, and the other provides data related to pageviews. Once we've done that, we will extend this to code to create our own datasets of Wikipedia edits or other data that you use as the basis of your Final Project.
In this project, we will explore a few ways to gather data using two Wikipedia APIs: one provides data related to edits, and the other provides data related to pageviews. Once we've done that, we will extend this to code to create our own datasets of Wikipedia edits or other data that you use as the basis of your Final Project.


This project is adapted from material being developed for the [[CDSW|Community Data Science Workshops]] by Ben Lewis and Morten Wang ([https://github.com/nettrom/wikipedia-session GitHub repo]).
This project is adapted from material being developed for the [[CDSW|Community Data Science Workshops]] by Ben Lewis and Morten Wang.  


== Overview ==
== Overview ==
In this project we will look at the viewing and editing history of a recently created Wikipedia article about a breaking news event—''[[w:Panama_Papers| Panama Papers]]''. When events of global significance occur, Wikipedia is often among the first places that people look for information about these events. By examining both the editing and viewing history of this article, we can learn a lot about how people create ''and'' consume information on Wikipedia.  
In this project we will look at the viewing and editing history of a recently created Wikipedia article about a breaking news event—''[[w:Panama_Papers| Panama Papers]]''. When events of global significance occur, Wikipedia is often among the first places that people look for information about these events. By examining both the editing and viewing history of this article, we can learn a lot about how people create ''and'' consume information on Wikipedia.  
The process by which 'breaking news' articles are created on Wikipedia is [http://dgergle.soc.northwestern.edu/resources/KeeganGergleContractor_StayingInTheLoop_WikiSym2012.pdf a fascinating area of research] for data scientists who are study how humans work together. For more links to interesting research on Wikipedia, see the [[DS4UX_(Spring_2016)/Wikipedia_API#Research_using_Wikipedia_data|Resources section]] of this page.


=== Goals ===
=== Goals ===
Line 48: Line 46:
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://www.mediawiki.org/wiki/API:Revisions View API:Revisions documentation]




Line 55: Line 52:
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=older View query in sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=older View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://www.mediawiki.org/wiki/API:Revisions View API:Revisions documentation]




; How many edits has the creator of Panama Papers made to Wikipedia?
; How many edits has the creator of Panama Papers made to Wikipedia?


* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View query in sandbox]
:* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View result in browser]
:* [https://en.wikipedia.org/w/api.php?action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View result in browser]
* [https://www.mediawiki.org/wiki/API:Users View API:Users documentation]




Line 69: Line 64:
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View result in browser]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View result in browser]
* [https://www.mediawiki.org/wiki/API:Revisions View API:Revisions documentation]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&oldid=713548359 revision on Wikipedia]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&oldid=713548359 View the text of this revision on Wikipedia]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&diff=713548359&oldid=713548357 "diff" view of revision]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&diff=713548359&oldid=713548357 View the "diff" version of revision] (shows what was changed between this edit and the previous one)




Line 111: Line 105:
Now that we're comfortable building API queries in the sandbox, we will focus on how we can access these APIs with Python. If you would like to review the steps involved in building an API query in Python, check out the resources listed below.
Now that we're comfortable building API queries in the sandbox, we will focus on how we can access these APIs with Python. If you would like to review the steps involved in building an API query in Python, check out the resources listed below.
* [https://github.com/makoshark/wikipedia-cdsw/blob/master/building-a-query.md Querying APIs from Python] a written lecture by Ben Lewis that walks you step-by-step through the process of building and executing an API query in Python. The 'companion script' <code>building_a_query_code.py</code> in the project directory executes all of the code shown in this lecture step-by-step. If you want to just execute some of the code in the lecture, comment out all the stuff below the blocks of code you want to execute it before you run the script.
* [https://github.com/makoshark/wikipedia-cdsw/blob/master/building-a-query.md Querying APIs from Python] a written lecture by Ben Lewis that walks you step-by-step through the process of building and executing an API query in Python. The 'companion script' <code>building_a_query_code.py</code> in the project directory executes all of the code shown in this lecture step-by-step. If you want to just execute some of the code in the lecture, comment out all the stuff below the blocks of code you want to execute it before you run the script.
* <code>wikipedia-1.py</code> — This is the script you were asked to execute to 'test' your code when you downloaded the project. It's also a valid API request that gathers metadata about the first revision to Panama Papers, and prints it to your terminal. The JSON that this query returns can be seen here: https://en.wikipedia.org/w/api.php?action=query&prop=revisions&titles=Panama_Papers&rvdir=newer&rvlimit=1&format=jsonfm
* <code>introduce_while.py</code> — (in project directory) this script uses a while loop to roll two 'virtual dice' until they both come up 6's.  
* <code>introduce_while.py</code> — (in project directory) this script uses a while loop to roll two 'virtual dice' until they both come up 6's. This example doesn't make API calls or use Wikipedia data—the point is to help you understand that sometimes you will need to loop through an operation (like an API request) an indeterminate number of times. In these situations, a 'while' loop is more appropriate than a 'for' loop.
* <code>introduce_continue.py</code> — (in project directory)
* <code>introduce_continue.py</code> — (in project directory) this script shows you two ways to use the value of the 'continue' key that is embedded inside the JSON returned by your API request. Each API request returns a chunk of data, but there may be more data available! By passing the value of 'continue' back in subsequent requests, you can pick up where the last request left off.
   
   


Line 128: Line 121:


;3. How many times was Panama Papers viewed in the first week? What proportion of those views came from mobile devices?
;3. How many times was Panama Papers viewed in the first week? What proportion of those views came from mobile devices?
Completing this exercise also requires two API requests: one to gather pageview data for ALL devices, and then performing a request that only gathers data about devices that viewed the page using the [https://en.m.wikipedia.org/wiki/Main_Page Wikipedia mobile website].




Line 148: Line 139:
* [https://en.wikipedia.org/w/api.php?action=help&modules=query API documentation for the query module]
* [https://en.wikipedia.org/w/api.php?action=help&modules=query API documentation for the query module]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox API Sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox API Sandbox]
* [[Sample Wikipedia API queries|More sample Wikipedia API queries]]
* [[Sample Wikipedia API queries]]
 
* [https://github.com/ben-zen/wikipedia-session The session lecture notes (in Markdown) and python sources.]


=== Research using Wikipedia data ===
=== Research using Wikipedia data ===
Line 155: Line 146:
* [http://www.brianckeegan.com/papers/CSCW_2015.pdf ‘Is’ to ‘Was’: Coordination and Commemoration on Posthumous Wikipedia Biographies] — an exploration of editing patterns around Wikipedia articles about people who have recently died.
* [http://www.brianckeegan.com/papers/CSCW_2015.pdf ‘Is’ to ‘Was’: Coordination and Commemoration on Posthumous Wikipedia Biographies] — an exploration of editing patterns around Wikipedia articles about people who have recently died.
* [http://www.brianckeegan.com/papers/ICS_2015.pdf WikiWorthy: Judging a Candidate’s Notability in the Community] — A study that uses the editing activity on Wikipedia articles about political candidates as a predictor of election success.
* [http://www.brianckeegan.com/papers/ICS_2015.pdf WikiWorthy: Judging a Candidate’s Notability in the Community] — A study that uses the editing activity on Wikipedia articles about political candidates as a predictor of election success.


=== Websites that use the MediaWiki API ===
=== Websites that use the MediaWiki API ===
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)