Community Data Science Course (Spring 2017)/Day 6 Notes: Difference between revisions

From CommunityData
(Created page with " == '''Downloading data from the internet''' == '''API (Application Programmer Interface)''': a structured way for two programs to communicate. Think of it like a contract or...")
 
Line 29: Line 29:
Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping.
Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping.


Sometimes, data providers make it easier to  
Often, data providers make it easier for computers to extract information from their API by providing it in a structured format.
 
'''Do this:'''
Go to [[view-source:http://nominatim.openstreetmap.org/?q=[bakery]+seattle+wa&format=json]] to see the same query in JSON format.
 
JSON: javascript object notation. JSON data looks like python lists and dictionaries, and we'll see that it's easy to turn it into a python variable that is a list or dictionary.
 
    [
    {
      "place_id":"21583441",
      "licence":"Data © OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
      "osm_type":"node",
      "osm_id":"2131716956",
      "boundingbox":[
        "47.6248735",
        "47.6249735",
        "-122.3207478",
        "-122.3206478"
      ],
      "lat":"47.6249235",
      "lon":"-122.3206978",
      "display_name":"The Confectional, 618, Broadway East, Eastlake, Capitol Hill, Seattle, King County, Washington, 98102, United States of America",
      "class":"shop",
      "type":"bakery",
      "importance":0.201,
      "icon":"http:\/\/nominatim.openstreetmap.org\/images\/mapicons\/shopping_bakery.p.20.png"
  }
]





Revision as of 02:57, 4 May 2017

Downloading data from the internet

API (Application Programmer Interface): a structured way for two programs to communicate. Think of it like a contract or a secret handshake.

Examples:

What to look for when looking at an API:

  1. Where is the documentation?
  2. What kinds of information can I request?
  3. How to I request information from this API?
  4. Are there any rate limits or restrictions on use? For instance, Twitter doesn't want you downloading tweets. Zillow forbids storing bulk results. (Why?)
  5. Is there a python package that will help me? For instance, Twitter has a great python package called tweepy that will simplify access.

Example We're going to spend today looking at Open Street Map's api called Nominatim.


Structured data and JSON

  • HTML is the markup language your browser uses to display information.

Do this: Go to [[bakery+seattle+wa]] and view source to see the raw html. See if you can find the structured data embedded in there somewhere. It's there, but it's often difficult to teach computers to find it.

Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping.

Often, data providers make it easier for computers to extract information from their API by providing it in a structured format.

Do this: Go to [[view-source:http://nominatim.openstreetmap.org/?q=[bakery]+seattle+wa&format=json]] to see the same query in JSON format.

JSON: javascript object notation. JSON data looks like python lists and dictionaries, and we'll see that it's easy to turn it into a python variable that is a list or dictionary.

   [
    {
     "place_id":"21583441",
     "licence":"Data © OpenStreetMap contributors, ODbL 1.0. http:\/\/www.openstreetmap.org\/copyright",
     "osm_type":"node",
     "osm_id":"2131716956",
     "boundingbox":[
        "47.6248735",
        "47.6249735",
        "-122.3207478",
        "-122.3206478"
     ],
     "lat":"47.6249235",
     "lon":"-122.3206978",
     "display_name":"The Confectional, 618, Broadway East, Eastlake, Capitol Hill, Seattle, King County, Washington, 98102, United States of America",
     "class":"shop",
     "type":"bakery",
     "importance":0.201,
     "icon":"http:\/\/nominatim.openstreetmap.org\/images\/mapicons\/shopping_bakery.p.20.png"
  }

]



The python requests library