Editing Community Data Science Course (Spring 2023)/Week 4 lecture notes

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 17: Line 17:
* Zillow's API: https://www.zillow.com/howto/api/APIOverview.htm
* Zillow's API: https://www.zillow.com/howto/api/APIOverview.htm


=== Questions to consider when choosing an API ===
; What to look for when looking at an API:


# Where is the documentation? Are there examples or code samples?
# Where is the documentation? Are there examples or code samples?
# What kinds of information can I request?
# How do I request information from this API?
# What format does it give me data back in?
# Are there any rate limits or restrictions on use? For instance, Twitter doesn't want you downloading tweets. Zillow forbids storing bulk results. (Why?)
# Are there any rate limits or restrictions on use? For instance, Twitter doesn't want you downloading tweets. Zillow forbids storing bulk results. (Why?)
# Is there a python package that will help me? For instance, Twitter has a great python package called tweepy that will simplify access.  
# Is there a python package that will help me? For instance, Twitter has a great python package called tweepy that will simplify access.  
# All the things on the checklist below!


=== Checklist: How do we use an API to fetch datasets? ===
=== Checklist: How do we use an API to fetch datasets? ===
Line 37: Line 39:
* What do you get back?
* What do you get back?
** Structured data (usually in the JSON format).
** Structured data (usually in the JSON format).
*** JSON is ''javascript object notation''. JSON data looks like python lists and dictionaries, and we'll see that it's easy to turn it into a python variable that is a list or dictionary. Here's a sample:
*** JSON is ''javascript object notation'''. JSON data looks like python lists and dictionaries, and we'll see that it's easy to turn it into a python variable that is a list or dictionary. Here's a sample:
* How do you understand (i.e. parse) the data?  
* How do you understand (i.e. parse) the data?  
** We can display it in Firefox automatically?
** We can display it in Firefox automatically?
** We can draw it out with https://jsonformatter.curiousconcept.com/
** We can draw it out with https://jsonformatter.curiousconcept.com/
** When it's time to do it Python, we can use the <code>.json()</code> function in the requests module!
** When it's time to do it Python, we can use the <code>.json()</code> function in the requests module!
=== How do we write Python programs that make web request ===
To use APIs to build a dataset we will need:
* all our tools from last session: variables, etc [DONE!]
* the ability to open URLs on the web
* the ability to create custom URLS
* the ability to understand (i.e., parse) JSON data that APIs usually give us
* the ability to save to files [DONE!]


== Our first API: Bored API ==
== Our first API: Bored API ==
Line 61: Line 72:
** <code>import requests</code>
** <code>import requests</code>
** <code>response = requests.get(URL, params={})</code>
** <code>response = requests.get(URL, params={})</code>
** <code>print(response.status_code)</code>. it also contains .url which is pretty useful!
** <code>print(response.status_code)</code>
** <code>data = response.json()</code>; now we can check type and poke around in it (I typically use tab completion!)
** <code>data = response.json()</code>; now we can check type and poke around in it  
* e.g., lets work through a quick example
* e.g., lets work through a quick example
** Let's put it into a Python program to print out one activity for 1 through 5 people!
** Let's put it into a Python program to print out one activity for 1 through 5 people!
** Let's add the type of activity to what we print out
** Let's add the type of activity to what we print out
** Let's add another parameter (maybe a price range?)
** Let's add another parameter (maybe a price range?)
** Let's show how to add parameters via dictionaries


== Introducing the OSM Nominatim API ==
== Introducing the OSM Nominatim API ==
Line 73: Line 83:
We're going to spend today looking at Open Street Map's api called [http://nominatim.openstreetmap.org/ Nominatim].
We're going to spend today looking at Open Street Map's api called [http://nominatim.openstreetmap.org/ Nominatim].


* Visiting the website to play around with it first: lets search for "bakery"
simple request:
** Lets pull up the documentation!
* These query strings have a particular form and they are often multiple; in, near, etc
** bakery in seattle; bakery in snohomish; bakery in bellevue
** Passing in [] brackets for amenities
* If we want to do it with Python, we will just reproduce the URL the same way
* Let's do it with Python!
* What if we want to have spaces? Uhoh. URLs can't have spaces...
** Instead, we can use use parameters to query the API
** If we go back to boredapi, turns out we can do that too
** lets turn url into a variable too!
* Understanding the output and extracting information
** go to the formatter
* Passing using bounded and viewbox to limit where we search
** looking up latlong
** passing in viewbox data from the website
 
=== Details on the Nominatim API ===
 
Simple request:


<syntaxhighlight lang="python">
  import requests  # this imports a new package called requests
  import requests
   
   
  response = requests.get('http://nominatim.openstreetmap.org/', params={'q': '[bakery] near seattle wa', 'format': 'json'})
  response = requests.get('http://nominatim.openstreetmap.org/', {'q': 'bakery near seattle wa', 'format': 'json'})
  print(response.status_code) # 200 means it worked.
  print response.status_code  # 200 means it worked.
  data = response.json()
  data = response.json()
  print(type(data))
  print(type(data))
</syntaxhighlight>


'''Do this:'''  
'''Do this:'''  
Go to <code>http://nominatim.openstreetmap.org/?q=[bakery]+near+seattle&format=json</code> to see the same query in JSON format.
Go to <code>http://nominatim.openstreetmap.org/?q=bakery+near+seattle&format=json</code> to see the same query in JSON format.


Let's break down each line:
Let's break down each line:


* <code>import requests</code> imports the library so we can use it.
* <code>import requests</code> imports the library so we can use it.
* <code>response = requests.get('http://nominatim.openstreetmap.org/', params={'q': '[bakery] near seattle wa', 'format': 'json'})</code>
* <code>response = requests.get('http://nominatim.openstreetmap.org/', {'q': 'bakery near seattle wa', 'format': 'json'})</code>
This is the most important line! Here, we "get" information from the web server. Note that we pass the url up to the "?" character as the first argument. Compare the dictionary second argument to the query we did above in our browser. How do they differ? How are they the same?
This is the most important line! Here, we "get" information from the web server. Note that we pass the url up to the "?" character as the first argument. Compare the dictionary second argument to the query we did above in our browser. How do they differ? How are they the same?
* <code>print(response.status_code)</code>  the response is a python object that contains the actual contents of the web page as well as some status information. Here, we're getting the status_code, which tells us whether the call succeeded. 200 is "good", and you will sometimes see 404 for "not found" or 500 for "server error".
* <code>print(response.status_code)</code>  the response is a python object that contains the actual contents of the web page as well as some status information. Here, we're getting the status_code, which tells us whether the call succeeded. 200 is "good", and you will sometimes see 404 for "not found" or 500 for "server error".
Line 117: Line 106:
Now lets break down the result:
Now lets break down the result:


<syntaxhighlight lang="json">
     [
     [
     {
     {
Line 139: Line 127:
   }
   }
   ]
   ]
</syntaxhighlight>


Things to realize:
Things to realize:


* We get a list with multiple dictionaries and of some of the values for those keys are lists!
* We get a list with multiple dictionaries and of some of the values for those keys are lists!
* We're given latitude and longitude. It's important to be able to find these!
** Right clicking on https://openstreetmap.org seems to work
** Google Maps: (1) On your computer, open Google Maps. (2) Right-click the place or area on the map. This will open a pop-up window. You can find your latitude and longitude in decimal format at the top. (3) To copy the coordinates automatically, left click on the latitude and longitude.
** This website: https://www.latlong.net/
Additional examples:
Let's look at the [https://nominatim.org/release-docs/develop/api/Search/#result-limitation result limitation] section of the documentation and try two things:
# Let's read the documentation
# Let's write a program to ask for more bakeries that are the the ones we've been given.
# Let's ask for a list of bakeries that are within the university district
# Let's plug the whole thing into Python
== Introduce the problem set ==


== FAQ ==
== FAQ ==


; What if there's no API?:  Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping and python has a library called BeautifulSoup to help with that.
; What if there's no API?:  Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping and python has a library called BeautifulSoup to help with that.
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)