Community Data Science Course (Spring 2023)/Week 6 lecture notes

From CommunityData
There are links to notebooks and similar back on the relevant section of the syllabus page.

Goals

Three goals for today's lecture:

  1. Talk about projects
  2. Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts
  3. walk through example code that grabs data from the Yelp API (and uses a module and authentication)

Final Projects

Your next major milestone is May 15 and it will be turning in your Final Project Proposal. I'm hoping that these proposals will be milestone in which everybody has: (a) clear description of your questions, (b) a clear sense of how you are going to get data to answer these question (and maybe even the data itself), and (c) confidence that your project will be doable.

A few points to talk through in class:

  • What are the components of successful project proposal? (e.g., text! dummy figures, etc)
  • A strong sense of whether your work is going to be doable.
  • Class assignments will continue to shift toward project work.

Wikipedia Edit Data from the API

Walk through some code and introduce some new concepts:

  • MediaWiki: The software that runs many wikis including basically every website on https://fandom.com
  • MediaWiki API with documentation in various places [1] [2]
  • Walk through some example code that I've written in these notebooks: [3] [4]

This introduces a few new concepts:

  • continuations (i.e., what do you do when you don't know how much data you have before you start?)
  • while True loops
  • updating your parameters to "get the next chunk"
  • time.sleep()

Yelp API

I also want to walk through an example of a package that comes from an API that is both (a) authenticated and (b) that requires interacting through Python module

  • Finding new Python modules
  • Installing new Python modules with %run pip install <PACKAGE>

The Yelp API is authenticated. Authenticatino can come in one of several forms including:

  • keys that are embedded into your normal parameters (like {'api-key' : 'SOMETHING'})
  • OAUTH authentication, bearer tokens, and so on...

Yelp is the latter kind. As it typically any API that lets you post and/or interact in ways that are non-passive.

That means you need to sign up for an API key. To do so at Yelp (and many other places) requires:

Some things to keep in mind include:

  • Keeping your API keys outside of your notebook:
    • in a JSON file in your directory!
    • e.g., in a separate python module

Now lets end by walking through two examples:

  • Yelp eample notebook #1 [Forthcoming]
  • Yelp example notebook #2 [Forthcoming]