Community Data Science Course (Spring 2016)/Day 4 Lecture

From CommunityData
Revision as of 00:49, 21 April 2016 by Guyrt (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Lecture Outline

Introduction and context
  • You can write some tools in Python now. Congratulations!
  • Today we'll learn how to find/create data sets
  • Next week we'll get into data science (asking and answering questions)


Outline
  • What is an API?
  • How do we use one to fetch interesting datasets?
  • How do we write programs that use the internet?
  • How can we use the placecage API to fetch pictures?
  • Introduction to structured data (JSON)
  • How do we use APIs in general?


What is a (web) API?
  • API: a structured way for programs to talk to each other (aka an interface for programs)
  • Web APIs: like a website your programs can visit (you:a website::your program:a web API)


How do we use an API to fetch datasets?

Basic idea: your program sends a request, the API sends data back

  • Where do you direct your request? The site's API endpoint.
  • How do I write my request? Put together a URL; it will be different for different web APIs.
    • Check the documentation, look for code samples
  • How do you send a request?
    • Python has modules you can use, like requests (they make HTTP requests)
  • What do you get back?
    • Structured data (usually in the JSON format)
  • How do you understand (i.e. parse) the data?
    • There's a module for that!


How do we write Python programs that make web requests?

To use APIs to build a dataset we will need:

  • all our tools from last session: variables, etc
  • the ability to open urls on the web
  • the ability to create custom URLS
  • the ability to save to files
  • the ability to understand (i.e., parse) JSON data that APIs usually give us


New programming concepts
  • interpolate variables into a string using % and %()s
  • requests
  • open files and write to them
  • A little bit about URLs


How do we use an API to fetch Nicolas Cage pictures?

placecage.com

  • API that takes specially crafted URLs and gives appropriately sized picture of kittens
  • Exploring placecage in a browser:
    • visit the API documentation
    • Cages of different sizes
    • Cages in greyscale or color
    • Crazy Cage
    • Cage GIF
  • Exercise: write a small program to grab an arbitrary square from placecage by asking for the size on standard in.
    • Hint: file_handle = open("local_file.jpg", "wb")
    • Hint: file_handle.write()
    • Hint: file_handle.close()
Introduction to structured data (JSON, JavaScriptObjectNotation)
  • what is json: useful for more structured data
  • import json; json.loads()
  • like Python (except no single quotes)
  • simple lists, dictionaries
  • can reflect more complicated data structures
  • Example file at http://mako.cc/cdsw.json
  • You can parse data directly with .json() on a requests call
Using other APIs
  • every API is different, so read the documentation!
  • If the documentation isn't helpful, search online
  • for popular APIs, there are python modules that help you make requests and parse json

Possible issues:

  • rate limiting
  • authentication
  • text encoding issues