Community Data Science Course (Spring 2017)/Day 6 Notes
From CommunityData
Downloading data from the internet
API (Application Programmer Interface): a structured way for two programs to communicate. Think of it like a contract or a secret handshake.
Examples:
- The api for twitter describes how to read tweets, write tweets, and follow people. See details here: https://dev.twitter.com/
- Yelp has an API described here: https://www.yelp.com/developers
- Zillow's API: https://www.zillow.com/howto/api/APIOverview.htm
What to look for when looking at an API:
- Where is the documentation?
- What kinds of information can I request?
- How to I request information from this API?
- Are there any rate limits or restrictions on use? For instance, Twitter doesn't want you downloading tweets. Zillow forbids storing bulk results. (Why?)
- Is there a python package that will help me? For instance, Twitter has a great python package called tweepy that will simplify access.
Example We're going to spend today looking at Open Street Map's api called Nominatim.
Structured data and JSON
- HTML is the markup language your browser uses to display information.
Do this: Go to [[bakery+seattle+wa]] and view source to see the raw html. See if you can find the structured data embedded in there somewhere. It's there, but it's often difficult to teach computers to find it.
Sometimes, the only way to get data is to extract it from potentially messy HTML. This is called scrapping.
Sometimes, data providers make it easier to
The python requests library