Editing DS4UX (Spring 2016)/Panama Papers

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 3: Line 3:
In this project, we will explore a few ways to gather data using two Wikipedia APIs: one provides data related to edits, and the other provides data related to pageviews. Once we've done that, we will extend this to code to create our own datasets of Wikipedia edits or other data that you use as the basis of your Final Project.
In this project, we will explore a few ways to gather data using two Wikipedia APIs: one provides data related to edits, and the other provides data related to pageviews. Once we've done that, we will extend this to code to create our own datasets of Wikipedia edits or other data that you use as the basis of your Final Project.


This project is adapted from material being developed for the [[CDSW|Community Data Science Workshops]] by Ben Lewis and Morten Wang ([https://github.com/nettrom/wikipedia-session GitHub repo]).
This project is adapted from material being developed for the [[CDSW|Community Data Science Workshops]] by Ben Lewis and Morten Wang.  


== Overview ==
== Overview ==
In this project we will look at the viewing and editing history of a recently created Wikipedia article about a breaking news event—''[[w:Panama_Papers| Panama Papers]]''. When events of global significance occur, Wikipedia is often among the first places that people look for information about these events. By examining both the editing and viewing history of this article, we can learn a lot about how people create ''and'' consume information on Wikipedia.  
In this project we will look at the viewing and editing history of a recently created Wikipedia article about a breaking news event—''[[w:Panama_Papers| Panama Papers]]''. When events of global significance occur, Wikipedia is often among the first places that people look for information about these events. By examining both the editing and viewing history of this article, we can learn a lot about how people create ''and'' consume information on Wikipedia.  
The process by which 'breaking news' articles are created on Wikipedia is [http://dgergle.soc.northwestern.edu/resources/KeeganGergleContractor_StayingInTheLoop_WikiSym2012.pdf a fascinating area of research] for data scientists who are study how humans work together. For more links to interesting research on Wikipedia, see the [[DS4UX_(Spring_2016)/Wikipedia_API#Research_using_Wikipedia_data|Resources section]] of this page.


=== Goals ===
=== Goals ===
Line 48: Line 46:
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://www.mediawiki.org/wiki/API:Revisions View API:Revisions documentation]




Line 55: Line 52:
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=older View query in sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=older View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&list=&meta=&titles=Panama_Papers&rvprop=ids%7Ctimestamp%7Cflags%7Ccomment%7Cuser&rvlimit=1&rvdir=newer View result in browser]
* [https://www.mediawiki.org/wiki/API:Revisions View API:Revisions documentation]




; How many edits has the creator of Panama Papers made to Wikipedia?
; How many edits has the creator of Panama Papers made to Wikipedia?


* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View query in sandbox]
:* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View result in browser]
:* [https://en.wikipedia.org/w/api.php?action=query&format=json&list=users&usprop=editcount%7Cregistration&ususers=Czar View result in browser]
* [https://www.mediawiki.org/wiki/API:Users View API:Users documentation]




Line 69: Line 64:
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View query in sandbox]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View result in browser]
* [https://en.wikipedia.org/w/api.php?action=query&format=json&prop=revisions&titles=Panama+Papers&rvprop=ids%7Ctimestamp%7Ccontent&rvstart=2016-04-04T17%3A58%3A00.000Z&rvend=2016-04-04T17%3A59%3A05.000Z&rvdir=newer View result in browser]
* [https://www.mediawiki.org/wiki/API:Revisions View API:Revisions documentation]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&oldid=713548359 revision on Wikipedia]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&oldid=713548359 View the text of this revision on Wikipedia]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&diff=713548359&oldid=713548357 "diff" view of revision]
* [https://en.wikipedia.org/w/index.php?title=Panama_Papers&diff=713548359&oldid=713548357 View the "diff" version of revision] (shows what was changed between this edit and the previous one)




Line 128: Line 122:


;3. How many times was Panama Papers viewed in the first week? What proportion of those views came from mobile devices?
;3. How many times was Panama Papers viewed in the first week? What proportion of those views came from mobile devices?
Completing this exercise also requires two API requests: one to gather pageview data for ALL devices, and then performing a request that only gathers data about devices that viewed the page using the [https://en.m.wikipedia.org/wiki/Main_Page Wikipedia mobile website].




Line 148: Line 140:
* [https://en.wikipedia.org/w/api.php?action=help&modules=query API documentation for the query module]
* [https://en.wikipedia.org/w/api.php?action=help&modules=query API documentation for the query module]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox API Sandbox]
* [https://en.wikipedia.org/wiki/Special:ApiSandbox API Sandbox]
* [[Sample Wikipedia API queries|More sample Wikipedia API queries]]
* [[Sample Wikipedia API queries]]
 
* [https://github.com/ben-zen/wikipedia-session The session lecture notes (in Markdown) and python sources.]


=== Research using Wikipedia data ===
=== Research using Wikipedia data ===
Line 155: Line 147:
* [http://www.brianckeegan.com/papers/CSCW_2015.pdf ‘Is’ to ‘Was’: Coordination and Commemoration on Posthumous Wikipedia Biographies] — an exploration of editing patterns around Wikipedia articles about people who have recently died.
* [http://www.brianckeegan.com/papers/CSCW_2015.pdf ‘Is’ to ‘Was’: Coordination and Commemoration on Posthumous Wikipedia Biographies] — an exploration of editing patterns around Wikipedia articles about people who have recently died.
* [http://www.brianckeegan.com/papers/ICS_2015.pdf WikiWorthy: Judging a Candidate’s Notability in the Community] — A study that uses the editing activity on Wikipedia articles about political candidates as a predictor of election success.
* [http://www.brianckeegan.com/papers/ICS_2015.pdf WikiWorthy: Judging a Candidate’s Notability in the Community] — A study that uses the editing activity on Wikipedia articles about political candidates as a predictor of election success.


=== Websites that use the MediaWiki API ===
=== Websites that use the MediaWiki API ===
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)