Community Data Science Course (Spring 2017)/Day 6 Notes

From CommunityData
Revision as of 01:38, 4 May 2017 by Guyrt (talk | contribs) (Created page with " == '''Downloading data from the internet''' == '''API (Application Programmer Interface)''': a structured way for two programs to communicate. Think of it like a contract or...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Downloading data from the internet

API (Application Programmer Interface): a structured way for two programs to communicate. Think of it like a contract or a secret handshake.

Examples:

What to look for when looking at an API:

  1. Where is the documentation?
  2. What kinds of information can I request?
  3. How to I request information from this API?
  4. Are there any rate limits or restrictions on use? For instance, Twitter doesn't want you downloading tweets. Zillow forbids storing bulk results. (Why?)
  5. Is there a python package that will help me? For instance, Twitter has a great python package called tweepy that will simplify access.

Example We're going to spend today looking at Open Street Map's api called Nominatim.


Structured data and JSON

  • HTML is the markup language your browser uses to display information.

Do this: Go to [[bakery+seattle+wa]] and view source to see the raw html. See if you can find the structured data embedded in there somewhere. It's there, but it's often difficult to teach computers to find it.

Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping.

Sometimes, data providers make it easier to



The python requests library