Community Data Science Course (Spring 2023)/Week 6 lecture notes

From CommunityData
< Community Data Science Course (Spring 2023)
Revision as of 23:16, 1 May 2023 by Benjamin Mako Hill (talk | contribs) (draft lecture material)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Three goals for today's lecture:

  1. talk about projects
  2. Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts
  3. walk through example code that grabs data from the Yelp API (and uses a module and authentication)

Final Projects

Your next major milestone is May 15 and it will be a final Project proposal. I'm hoping that you have a clear description of your questions and a clear sense of how you are goint to get data to answer these.

A few points to talk through:

  • What are the components of successful project proposal? (e.g., dummy figures, etc)
  • A strong sense of whether your work is going to be doable.
  • Class assignments will continue to shift toward project work.

Wikipedia Edit Data from the API

Walk through some code and introduce some new concepts:

  • MediaWiki: The software that runs many wikis including basicaly every website on Fandom.com
  • MediaWiki API with documentation in various places [1] [2]
  • Walk through some example code that I've written in this notebook [Forthcoming]

This introduces a few new concepts:

  • while True loops
  • continuations (i.e., what do you do when you dont' nkow how much data you have?)
  • time.sleep()

Yelp API

I also want to walk through an example of a package that comes from an API that is both (a) authenticated and (b) that requires interacting through Python module

  • Finding new Python modules
  • Installing new Python modules with %run pip install <PACKAGE>

The Yelp API is authenticated. Authenticatino can come in one of several forms including:

  • keys that are embedded into your normal parameters {'api-key

' : 'SOMETHING'}

  • OAUTH authentication, bearer tokens, and so on...

Yelp is the latter kind. As it typically any API that lets you post and/or interact in ways that are non-passive.

That means you need to sign up for an API key. To do so at Yelp (and many other places) requires:

  • creating a App ("wait... I'm creating an app?!")

Some things to keep in mind include:

  • Keeping your API keys outside of your notebook:
    • in a JSON file in your directory!
    • e.g., in a separate python module

Now lets end by walking through two examples:

  • Yelp eample notebook #1 [Forthcoming]
  • Yelp example notebook #2 [Forthcoming]