Community Data Science Course (Spring 2016)/Day 4 Lecture

Lecture Outline

 * Introduction and context


 * You can write some tools in Python now. Congratulations!
 * Today we'll learn how to find/create data sets
 * Next week we'll get into data science (asking and answering questions)


 * Outline:


 * What is an API?
 * How do we use one to fetch interesting datasets?
 * How do we write programs that use the internet?
 * How can we use the placecage API to fetch pictures?
 * Introduction to structured data (JSON)
 * How do we use APIs in general?


 * What is a (web) API?


 * API: a structured way for programs to talk to each other (aka an interface for programs)
 * Web APIs: like a website your programs can visit (you:a website::your program:a web API)


 * How do we use an API to fetch datasets?

Basic idea: your program sends a request, the API sends data back
 * Where do you direct your request? The site's API endpoint.
 * For example: Wikipedia's web API endpoint is http://en.wikipedia.org/w/api.php
 * How do I write my request? Put together a URL; it will be different for different web APIs.
 * Check the documentation, look for code samples
 * How do you send a request?
 * Python has modules you can use, like  (they make HTTP requests)
 * What do you get back?
 * Structured data (usually in the JSON format)
 * How do you understand (i.e. parse) the data?
 * There's a module for that!


 * How do we write Python programs that make web requests?

To use APIs to build a dataset we will need:
 * all our tools from last session: variables, etc
 * the ability to open urls on the web
 * the ability to create custom URLS
 * the ability to save to files
 * the ability to understand (i.e., parse) JSON data that APIs usually give us


 * New programming concepts:


 * interpolate variables into a string using % and %s
 * requests
 * open files and write to them
 * A little bit about URLs


 * How do we use an API to fetch Nicolas Cage pictures?

placecage.com
 * API that takes specially crafted URLs and gives appropriately sized picture of kittens
 * Exploring placecage in a browser:
 * visit the API documentation
 * Cages of different sizes
 * Cages in greyscale or color
 * Crazy Cage
 * Cage GIF
 * Exercise: write a small program to grab an arbitrary square from placecage by asking for the size on standard in.
 * Hint: file_handle = open("local_file.jpg", "wb")
 * Hint: file_handle.write
 * Hint: file_handle.close


 * Introduction to structured data (JSON, JavaScriptObjectNotation)


 * what is json: useful for more structured data
 * import json; json.loads
 * like Python (except no single quotes)
 * simple lists, dictionaries
 * can reflect more complicated data structures
 * Example file at http://mako.cc/cdsw.json
 * You can parse data directly with  on a   call


 * Using other APIs


 * every API is different, so read the documentation!
 * If the documentation isn't helpful, search online
 * for popular APIs, there are python modules that help you make requests and parse json

Possible issues:
 * rate limiting
 * authentication
 * text encoding issues