Community Data Science Course (Spring 2016)/Day 5 Coding Challenges: Difference between revisions

From CommunityData
Line 11: Line 11:


# Save the revision metadata printed in <code>wikipedia1-2.py</code>  (i.e., the material already being printed out) to a file called "wikipedia_revisions.tsv".
# Save the revision metadata printed in <code>wikipedia1-2.py</code>  (i.e., the material already being printed out) to a file called "wikipedia_revisions.tsv".
# Print out the revision ids and edit summaries (i.e., <code>comment</code>) of each revision for the article on Python.
# Print out the revision ids and edit summaries (i.e., <code>comment</code>) of each revision for the article on Python. ''Hint: modify example 1''
## ''modified'' Print out the id, the parent id, and the content of the revision.
## ''modified'' Print out the id, the parent id, and the content of the revision.
# Find out what other data or metadata you can print out for a revision for an article.
# Find out what other data or metadata you can print out for a revision for an article. ''Hint: this isn't a coding question''
# Which article is in more categories? [[:wiki:Python (programming language)|Python (programming language)]] or [[:wiki:R (programming language)|R (programming language)]]?   
# Which article is in more categories? [[:wiki:Python (programming language)|Python (programming language)]] or [[:wiki:R (programming language)|R (programming language)]]?  ''Hint: modify question 2 (example 1). You'll want to investigate the titles key in the wikipedia api''
# Find out how many revisions to the article on "Python (programming language)" were made by user "Peterl"? How about "Hfastedge"?
# Find out how many revisions to the article on "Python (programming language)" were made by user "Peterl"? How about "Hfastedge"? ''Hint: modify example 1-2. You'll want to make sure you get username from the api''
# How would you use the API to find out how many revisions/edits the user "Benjamin Mako Hill" has made to Wikipedia?
# How would you use the API to find out how many revisions/edits the user "Benjamin Mako Hill" has made to Wikipedia? ''Hint: coming''
# Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited?
# Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited? ''Hint: coming''
# How many edits to the article "Python (programming language)" were made in 2014?
# How many edits to the article "Python (programming language)" were made in 2014? ''Hint: example 1''
 
;Here's a much more complicated challenge but a fun one that you know enough to solve: Check out the game [http://kevan.org/catfishing.php Catfishing] which shows you categories and has you guess an article. Write a version that uses the Wikipedia API. For example, pick 5 articles and write a program that will randomly show the categories for one of those articles and to ask you to guess the article. Read the guess with <code>input()</code> and let the user know if they go it right or wrong!

Revision as of 18:46, 28 April 2016

Get the software

http://mako.cc/teaching/2015/community_data_science/wikipedia-data-examples.zip

Each of the challenges this week will ask you to modify and work with code in the zip file above.

As always, it's not essential that you solve or get through all of these — I'm not grading your answers on these. That said, being able to work through at least many of them is a good sign that you have mastered the concepts for the week. It is always fine to collaborate or work together on these problem sets. The only thing I ask is that you do not broadcast answers before Sunday at midnight on Canvas.

Challenges

  1. Save the revision metadata printed in wikipedia1-2.py (i.e., the material already being printed out) to a file called "wikipedia_revisions.tsv".
  2. Print out the revision ids and edit summaries (i.e., comment) of each revision for the article on Python. Hint: modify example 1
    1. modified Print out the id, the parent id, and the content of the revision.
  3. Find out what other data or metadata you can print out for a revision for an article. Hint: this isn't a coding question
  4. Which article is in more categories? Python (programming language) or R (programming language)? Hint: modify question 2 (example 1). You'll want to investigate the titles key in the wikipedia api
  5. Find out how many revisions to the article on "Python (programming language)" were made by user "Peterl"? How about "Hfastedge"? Hint: modify example 1-2. You'll want to make sure you get username from the api
  6. How would you use the API to find out how many revisions/edits the user "Benjamin Mako Hill" has made to Wikipedia? Hint: coming
  7. Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited? Hint: coming
  8. How many edits to the article "Python (programming language)" were made in 2014? Hint: example 1