Community Data Science Course (Spring 2023)/Week 6 lecture notes: Difference between revisions

From CommunityData
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
{{notice|There are links to notebooks and similar back on [[../#Week_6%3A_May_1|the relevant section of the syllabus page]].}}
== Goals ==
Three goals for today's lecture:
Three goals for today's lecture:


# talk about projects
# Talk about projects
# Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts
# Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts
# walk through example code that grabs data from the Yelp API (and uses a module and authentication)
# Walk through example code that grabs data from the Yelp API (and uses a module and authentication)


== Final Projects ==
== Final Projects ==
Line 21: Line 25:
* [https://www.mediawiki.org/wiki/MediaWiki MediaWiki]: The software that runs many wikis including basically every website on https://fandom.com
* [https://www.mediawiki.org/wiki/MediaWiki MediaWiki]: The software that runs many wikis including basically every website on https://fandom.com
* MediaWiki API with documentation in various places [https://www.mediawiki.org/wiki/API:Main_page] [https://en.wikipedia.org/w/api.php]
* MediaWiki API with documentation in various places [https://www.mediawiki.org/wiki/API:Main_page] [https://en.wikipedia.org/w/api.php]
* Walk through some example code that I've written in this notebook {{forthcoming}}
* Walk through some example code that I've written in these notebooks: [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/01-collect_rock_bands_json.ipynb] [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/02-analyze_rock_band_data.ipynb]


This introduces a few new concepts:
This introduces a few new concepts:


* continuations (i.e., what do you do when you don't know how much data you have before you start?)
* <code>while True</code> loops
* <code>while True</code> loops
* continuations (i.e., what do you do when you dont' nkow how much data you have?)
* updating your parameters to "get the next chunk"
* <code>time.sleep()</code>
* <code>time.sleep()</code>


Line 36: Line 41:
* Installing new Python modules with <code>%run pip install <PACKAGE></code>
* Installing new Python modules with <code>%run pip install <PACKAGE></code>


The Yelp API is ''authenticated''. Authenticatino can come in one of several forms including:
The Yelp API is ''authenticated''. Authentication can come in one of several forms including:


* keys that are embedded into your normal parameters (like <code>{'api-key' : 'SOMETHING'}</code>)
* keys that are embedded into your normal parameters (like <code>{'api-key' : 'SOMETHING'}</code>)
Line 46: Line 51:


* creating a App ("''wait... I'm creating an app?!''")
* creating a App ("''wait... I'm creating an app?!''")
* My app: https://www.yelp.com/developers/v3/manage_app


Some things to keep in mind include:
Some things to keep in mind include:
Line 55: Line 61:
Now lets end by walking through two examples:
Now lets end by walking through two examples:


* Yelp eample notebook #1 {{forthcoming}}
* [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/yelp_business_search-direct.ipynb Yelp example notebook #1] (direct nonmodule version)
* Yelp example notebook #2 {{forthcoming}}
* [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/yelp_business_search-module.ipynb Yelp example notebook #2] (versions using the yelpapi module)

Latest revision as of 02:27, 10 May 2023

Cmbox notice.png There are links to notebooks and similar back on the relevant section of the syllabus page.

Goals[edit]

Three goals for today's lecture:

  1. Talk about projects
  2. Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts
  3. Walk through example code that grabs data from the Yelp API (and uses a module and authentication)

Final Projects[edit]

Your next major milestone is May 15 and it will be turning in your Final Project Proposal. I'm hoping that these proposals will be milestone in which everybody has: (a) clear description of your questions, (b) a clear sense of how you are going to get data to answer these question (and maybe even the data itself), and (c) confidence that your project will be doable.

A few points to talk through in class:

  • What are the components of successful project proposal? (e.g., text! dummy figures, etc)
  • A strong sense of whether your work is going to be doable.
  • Class assignments will continue to shift toward project work.

Wikipedia Edit Data from the API[edit]

Walk through some code and introduce some new concepts:

  • MediaWiki: The software that runs many wikis including basically every website on https://fandom.com
  • MediaWiki API with documentation in various places [1] [2]
  • Walk through some example code that I've written in these notebooks: [3] [4]

This introduces a few new concepts:

  • continuations (i.e., what do you do when you don't know how much data you have before you start?)
  • while True loops
  • updating your parameters to "get the next chunk"
  • time.sleep()

Yelp API[edit]

I also want to walk through an example of a package that comes from an API that is both (a) authenticated and (b) that requires interacting through Python module

  • Finding new Python modules
  • Installing new Python modules with %run pip install <PACKAGE>

The Yelp API is authenticated. Authentication can come in one of several forms including:

  • keys that are embedded into your normal parameters (like {'api-key' : 'SOMETHING'})
  • OAUTH authentication, bearer tokens, and so on...

Yelp is the latter kind. As it typically any API that lets you post and/or interact in ways that are non-passive.

That means you need to sign up for an API key. To do so at Yelp (and many other places) requires:

Some things to keep in mind include:

  • Keeping your API keys outside of your notebook:
    • in a JSON file in your directory!
    • e.g., in a separate python module

Now lets end by walking through two examples: