Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Community Data Science Course (Spring 2016)/Day 5 Coding Challenges
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Helpful script == This script walks through our exploration of a query for categories on a page, which we did right at the end of class on Wednesday. <source lang="python"> # Import the requests lib. import requests # Set up a query that grabs categories for the python page in json format. request_dict = { 'action': 'query', 'format': 'json', 'prop': 'categories', 'titles': 'Python_(programming_language)', 'clprop': 'timestamp' } # Make a call to the wikipedia api. wp_call = requests.get('https://en.wikipedia.org/w/api.php', request_dict) # Create a dict from json. response = wp_call.json() # Let's just print it! print(response) # Woah... big dictionary here. [Question: how did I know it was a dictionary from printing it?] type(response) # Out[7]: dict # Ok, confirmed... it's a dictionary. # If something is a dictionary then check it's keys. response.keys() # Out[8]: dict_keys(['continue', 'query']) # I told you not to worry much about continue, so let's look at query. # Q1: what type is it? type(response['query']) # Out[9]: dict print(response['query']) # Woops.... still huge. Let's explore more. # Ok, so response['query'] is a dict. Which means it has keys! response['query'].keys() # Out[10]: dict_keys(['pages', 'normalized']) response['query']['normalized'] # Out[11]: # [{'from': 'Python_(programming_language)', # 'to': 'Python (programming language)'}] # Ok, so normalized is a small list [HOW DID I KNOW?]. I can pretty much see what it's listing: ways of rewriting the query. # In this case, it changed spaces to _. response['query']['pages'] # Woah... still huge. Let's explore more. type(response['query']['pages']) # Out[13]: dict # Ok, it's a dict. Let's look at keys! response['query']['pages'].keys() # dict_keys(['23862']) # One key. This is the page id! [WHAT IF YOU CHANGE titles IN THE INPUT TO QUERY TWO PAGES?] response['query']['pages']['23862'] # Still big, so let's keep going. response['query']['pages']['23862'].keys() # Out[16]: dict_keys(['categories', 'pageid', 'ns', 'title']) # Let's look at each key. response['query']['pages']['23862']['title'] # Out[17]: 'Python (programming language)' # That one makes sense... response['query']['pages']['23862']['ns'] # Out[18]: 0 # I don't know what it is but it doesn't seem useful right now. I'll keep exploring. response['query']['pages']['23862']['pageid'] # Out[19]: 23862 # This is an int (how did I know?) that apparently corresponds to the key in response['query']['pages'] response['query']['pages']['23862']['categories'] # It's a list [HOW DID I KNOW from the printout?] Still kind of long... let's keep going. type(response['query']['pages']['23862']['categories']) # Out[20]: list # Ok, confirmed it's a list. I got same info when the printout above started with '[' len(response['query']['pages']['23862']['categories']) # Out[21]: 10 # Ten categories. The docs say that's a default. response['query']['pages']['23862']['categories'][0] # Out[22]: # {'ns': 14, # 'timestamp': '2016-02-03T16:53:02Z', # 'title': 'Category:Articles with DMOZ links'} # Now I've learned something: the elements of categories are DICTs (note the '{', '}' in output or use type) # I've learned that there are titles in every category. # What's next? # REPEAT THIS EXERCISE but query wikipedia for revisions not categories. Walk through the json output, which is # composed of lists and dictionaries. </source>
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information