Community Data Science Course (Spring 2016)/Day 4 Lecture: Difference between revisions
From CommunityData
No edit summary |
|||
(One intermediate revision by the same user not shown) | |||
Line 12: | Line 12: | ||
* How do we use one to fetch interesting datasets? | * How do we use one to fetch interesting datasets? | ||
* How do we write programs that use the internet? | * How do we write programs that use the internet? | ||
* How can we use the placecage API to fetch | * How can we use the placecage API to fetch pictures? | ||
* Introduction to structured data (JSON) | * Introduction to structured data (JSON) | ||
* How do we use APIs in general? | * How do we use APIs in general? | ||
Line 56: | Line 56: | ||
; How do we use an API to fetch | ; How do we use an API to fetch Nicolas Cage pictures? | ||
[http://placecage.com/ placecage.com] | [http://placecage.com/ placecage.com] | ||
Line 67: | Line 67: | ||
** Cage GIF | ** Cage GIF | ||
* Exercise: write a small program to grab an arbitrary square from placecage by asking for the size on standard in. | * Exercise: write a small program to grab an arbitrary square from placecage by asking for the size on standard in. | ||
** Hint: file_handle = open("local_file.jpg", "wb") | |||
** Hint: file_handle.write() | |||
** Hint: file_handle.close() | |||
; Introduction to structured data (JSON, JavaScriptObjectNotation) | ; Introduction to structured data (JSON, JavaScriptObjectNotation) |
Latest revision as of 00:49, 21 April 2016
Lecture Outline[edit]
- Introduction and context
- You can write some tools in Python now. Congratulations!
- Today we'll learn how to find/create data sets
- Next week we'll get into data science (asking and answering questions)
- Outline
- What is an API?
- How do we use one to fetch interesting datasets?
- How do we write programs that use the internet?
- How can we use the placecage API to fetch pictures?
- Introduction to structured data (JSON)
- How do we use APIs in general?
- What is a (web) API?
- API: a structured way for programs to talk to each other (aka an interface for programs)
- Web APIs: like a website your programs can visit (you:a website::your program:a web API)
- How do we use an API to fetch datasets?
Basic idea: your program sends a request, the API sends data back
- Where do you direct your request? The site's API endpoint.
- For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php
- How do I write my request? Put together a URL; it will be different for different web APIs.
- Check the documentation, look for code samples
- How do you send a request?
- Python has modules you can use, like
requests
(they make HTTP requests)
- Python has modules you can use, like
- What do you get back?
- Structured data (usually in the JSON format)
- How do you understand (i.e. parse) the data?
- There's a module for that!
- How do we write Python programs that make web requests?
To use APIs to build a dataset we will need:
- all our tools from last session: variables, etc
- the ability to open urls on the web
- the ability to create custom URLS
- the ability to save to files
- the ability to understand (i.e., parse) JSON data that APIs usually give us
- New programming concepts
- interpolate variables into a string using % and %()s
- requests
- open files and write to them
- A little bit about URLs
- How do we use an API to fetch Nicolas Cage pictures?
- API that takes specially crafted URLs and gives appropriately sized picture of kittens
- Exploring placecage in a browser:
- visit the API documentation
- Cages of different sizes
- Cages in greyscale or color
- Crazy Cage
- Cage GIF
- Exercise: write a small program to grab an arbitrary square from placecage by asking for the size on standard in.
- Hint: file_handle = open("local_file.jpg", "wb")
- Hint: file_handle.write()
- Hint: file_handle.close()
- Introduction to structured data (JSON, JavaScriptObjectNotation)
- what is json: useful for more structured data
- import json; json.loads()
- like Python (except no single quotes)
- simple lists, dictionaries
- can reflect more complicated data structures
- Example file at http://mako.cc/cdsw.json
- You can parse data directly with
.json()
on arequests
call
- Using other APIs
- every API is different, so read the documentation!
- If the documentation isn't helpful, search online
- for popular APIs, there are python modules that help you make requests and parse json
Possible issues:
- rate limiting
- authentication
- text encoding issues