Community Data Science Course (Spring 2016)/Day 5 Coding Challenges

From CommunityData

Get the software

http://mako.cc/teaching/2015/community_data_science/wikipedia-data-examples.zip

Each of the challenges this week will ask you to modify and work with code in the zip file above.

As always, it's not essential that you solve or get through all of these — I'm not grading your answers on these. That said, being able to work through at least many of them is a good sign that you have mastered the concepts for the week. It is always fine to collaborate or work together on these problem sets. The only thing I ask is that you do not broadcast answers before Sunday at midnight on Canvas.

Challenges

  1. Save the revision metadata printed in wikipedia1-2.py (i.e., the material already being printed out) to a file called "wikipedia_revisions.tsv".
  2. Print out the revision ids and edit summaries (i.e., comment) of each revision for the article on Python. Hint: modify example 1
    1. modified Print out the id, the parent id, and the content of the revision.
  3. Find out what other data or metadata you can print out for a revision for an article. Hint: this isn't a coding question
  4. Which article is in more categories? Python (programming language) or R (programming language)? Hint: modify question 2 (example 1). You'll want to investigate the titles key in the wikipedia api
  5. Find out how many revisions to the article on "Python (programming language)" were made by user "Peterl"? How about "Hfastedge"? Hint: modify example 1-2. You'll want to make sure you get username from the api
  6. How would you use the API to find out how many revisions/edits the user "Benjamin Mako Hill" has made to Wikipedia? Hint: coming
  7. Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited? Hint: coming
  8. How many edits to the article "Python (programming language)" were made in 2014? Hint: example 1