Community Data Science Course (Spring 2023)/Week 6 lecture notes: Difference between revisions
(→Goals) |
|||
(8 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
{{notice|There are links to notebooks and similar back on [[../#Week_6%3A_May_1|the relevant section of the syllabus page]].}} | |||
== Goals == | |||
Three goals for today's lecture: | Three goals for today's lecture: | ||
# | # Talk about projects | ||
# Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts | # Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts | ||
# | # Walk through example code that grabs data from the Yelp API (and uses a module and authentication) | ||
== Final Projects == | == Final Projects == | ||
Line 21: | Line 25: | ||
* [https://www.mediawiki.org/wiki/MediaWiki MediaWiki]: The software that runs many wikis including basically every website on https://fandom.com | * [https://www.mediawiki.org/wiki/MediaWiki MediaWiki]: The software that runs many wikis including basically every website on https://fandom.com | ||
* MediaWiki API with documentation in various places [https://www.mediawiki.org/wiki/API:Main_page] [https://en.wikipedia.org/w/api.php] | * MediaWiki API with documentation in various places [https://www.mediawiki.org/wiki/API:Main_page] [https://en.wikipedia.org/w/api.php] | ||
* Walk through some example code that I've written in | * Walk through some example code that I've written in these notebooks: [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/01-collect_rock_bands_json.ipynb] [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/02-analyze_rock_band_data.ipynb] | ||
This introduces a few new concepts: | This introduces a few new concepts: | ||
Line 37: | Line 41: | ||
* Installing new Python modules with <code>%run pip install <PACKAGE></code> | * Installing new Python modules with <code>%run pip install <PACKAGE></code> | ||
The Yelp API is ''authenticated''. | The Yelp API is ''authenticated''. Authentication can come in one of several forms including: | ||
* keys that are embedded into your normal parameters (like <code>{'api-key' : 'SOMETHING'}</code>) | * keys that are embedded into your normal parameters (like <code>{'api-key' : 'SOMETHING'}</code>) | ||
Line 47: | Line 51: | ||
* creating a App ("''wait... I'm creating an app?!''") | * creating a App ("''wait... I'm creating an app?!''") | ||
* My app: https://www.yelp.com/developers/v3/manage_app | |||
Some things to keep in mind include: | Some things to keep in mind include: | ||
Line 56: | Line 61: | ||
Now lets end by walking through two examples: | Now lets end by walking through two examples: | ||
* Yelp | * [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/yelp_business_search-direct.ipynb Yelp example notebook #1] (direct nonmodule version) | ||
* Yelp example notebook #2 | * [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week6/yelp_business_search-module.ipynb Yelp example notebook #2] (versions using the yelpapi module) |
Latest revision as of 00:27, 10 May 2023
There are links to notebooks and similar back on the relevant section of the syllabus page. |
Goals[edit]
Three goals for today's lecture:
- Talk about projects
- Walk through example code that grabs data from the MediaWiki API and introduces a small number of new concepts
- Walk through example code that grabs data from the Yelp API (and uses a module and authentication)
Final Projects[edit]
Your next major milestone is May 15 and it will be turning in your Final Project Proposal. I'm hoping that these proposals will be milestone in which everybody has: (a) clear description of your questions, (b) a clear sense of how you are going to get data to answer these question (and maybe even the data itself), and (c) confidence that your project will be doable.
A few points to talk through in class:
- What are the components of successful project proposal? (e.g., text! dummy figures, etc)
- A strong sense of whether your work is going to be doable.
- Class assignments will continue to shift toward project work.
Wikipedia Edit Data from the API[edit]
Walk through some code and introduce some new concepts:
- MediaWiki: The software that runs many wikis including basically every website on https://fandom.com
- MediaWiki API with documentation in various places [1] [2]
- Walk through some example code that I've written in these notebooks: [3] [4]
This introduces a few new concepts:
- continuations (i.e., what do you do when you don't know how much data you have before you start?)
while True
loops- updating your parameters to "get the next chunk"
time.sleep()
Yelp API[edit]
I also want to walk through an example of a package that comes from an API that is both (a) authenticated and (b) that requires interacting through Python module
- Finding new Python modules
- Installing new Python modules with
%run pip install <PACKAGE>
The Yelp API is authenticated. Authentication can come in one of several forms including:
- keys that are embedded into your normal parameters (like
{'api-key' : 'SOMETHING'}
) - OAUTH authentication, bearer tokens, and so on...
Yelp is the latter kind. As it typically any API that lets you post and/or interact in ways that are non-passive.
That means you need to sign up for an API key. To do so at Yelp (and many other places) requires:
- creating a App ("wait... I'm creating an app?!")
- My app: https://www.yelp.com/developers/v3/manage_app
Some things to keep in mind include:
- Keeping your API keys outside of your notebook:
- in a JSON file in your directory!
- e.g., in a separate python module
Now lets end by walking through two examples:
- Yelp example notebook #1 (direct nonmodule version)
- Yelp example notebook #2 (versions using the yelpapi module)