Editing Community Data Science Course (Spring 2016)/Day 5 Coding Challenges

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 19: Line 19:
# Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited? ''Hint: coming''
# Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited? ''Hint: coming''
# How many edits to the article "Python (programming language)" were made in 2014? ''Hint: example 1''
# How many edits to the article "Python (programming language)" were made in 2014? ''Hint: example 1''
== Helpful script ==
This script walks through our exploration of a query for categories on a page, which we did right at the end of class on Wednesday.
<source lang="python">
# Import the requests lib.
import requests
# Set up a query that grabs categories for the python page in json format.
request_dict = {
'action': 'query',
'format': 'json',
'prop': 'categories',
'titles': 'Python_(programming_language)',
'clprop': 'timestamp'
}
# Make a call to the wikipedia api.
wp_call = requests.get('https://en.wikipedia.org/w/api.php', request_dict)
# Create a dict from json.
response = wp_call.json()
# Let's just print it!
print(response)
# Woah... big dictionary here. [Question: how did I know it was a dictionary from printing it?]
type(response)
# Out[7]: dict
# Ok, confirmed... it's a dictionary.
# If something is a dictionary then check it's keys.
response.keys()
# Out[8]: dict_keys(['continue', 'query'])
# I told you not to worry much about continue, so let's look at query.
# Q1: what type is it?
type(response['query'])
# Out[9]: dict
print(response['query'])
# Woops.... still huge. Let's explore more.
# Ok, so response['query'] is a dict. Which means it has keys!
response['query'].keys()
# Out[10]: dict_keys(['pages', 'normalized'])
response['query']['normalized']
# Out[11]:
# [{'from': 'Python_(programming_language)',
#  'to': 'Python (programming language)'}]
# Ok, so normalized is a small list [HOW DID I KNOW?]. I can pretty much see what it's listing: ways of rewriting the query.
# In this case, it changed spaces to _.
response['query']['pages']
# Woah... still huge. Let's explore more.
type(response['query']['pages'])
# Out[13]: dict
# Ok, it's a dict. Let's look at keys!
response['query']['pages'].keys()
# dict_keys(['23862'])
# One key. This is the page id! [WHAT IF YOU CHANGE titles IN THE INPUT TO QUERY TWO PAGES?]
response['query']['pages']['23862']
# Still big, so let's keep going.
response['query']['pages']['23862'].keys()
# Out[16]: dict_keys(['categories', 'pageid', 'ns', 'title'])
# Let's look at each key.
response['query']['pages']['23862']['title']
# Out[17]: 'Python (programming language)'
# That one makes sense...
response['query']['pages']['23862']['ns']
# Out[18]: 0
# I don't know what it is but it doesn't seem useful right now. I'll keep exploring.
response['query']['pages']['23862']['pageid']
# Out[19]: 23862
# This is an int (how did I know?) that apparently corresponds to the key in response['query']['pages']
response['query']['pages']['23862']['categories']
# It's a list [HOW DID I KNOW from the printout?] Still kind of long... let's keep going.
type(response['query']['pages']['23862']['categories'])
# Out[20]: list
# Ok, confirmed it's a list. I got same info when the printout above started with '['
len(response['query']['pages']['23862']['categories'])
# Out[21]: 10
# Ten categories. The docs say that's a default.
response['query']['pages']['23862']['categories'][0]
# Out[22]:
# {'ns': 14,
#  'timestamp': '2016-02-03T16:53:02Z',
#  'title': 'Category:Articles with DMOZ links'}
# Now I've learned something: the elements of categories are DICTs (note the '{', '}' in output or use type)
# I've learned that there are titles in every category.
# What's next?
# REPEAT THIS EXERCISE but query wikipedia for revisions not categories. Walk through the json output, which is
# composed of lists and dictionaries.
</source>
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)