Editing Community Data Science Course (Spring 2023)/Week 5 coding challenges
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 4: | Line 4: | ||
* [[../Week 5 lecture notes]] | * [[../Week 5 lecture notes]] | ||
* | * The [Week 5 lecture notebook] | ||
* The [Week 5 lecture video] | |||
* The [ | |||
== #1 Wikipedia Page View API == | == #1 Wikipedia Page View API == | ||
# Identify a famous person | # Identify a famous person that you are interested in and collect page view data on that person. Generate a time-series visualization and include a link to it in your notebook. | ||
# Identify 2 other languages editions of Wikipedia that have articles on that person. Collect page view data on the article in other languages and create a single visualization that shows how the dynamics and similar and/or different. | # Identify 2 other languages editions of Wikipedia that have articles on that person. Collect page view data on the article in other languages and create a single visualization that shows how the dynamics and similar and/or different. | ||
# Collect page view data on | # Collect page view data on [https://en.wikipedia.org/wiki/Marvel_Comics Marvel Comics] and [https://en.wikipedia.org/wiki/DC_Comics DC Comics] in Wikipedia. (If you'd rather replace these examples with some other comparison of popular rivals, that's fine.) | ||
## Which has more total page views in 2022? | ## Which has more total page views in 2022? | ||
## Can you draw a visualization | ## Can you draw a visualization of this? | ||
## | ## Where there years since 2015 when the less viewed page was viewed more? How many and which ones? | ||
## | ## Where their any months was this true? How many and which ones? | ||
## How about any days? How many? | ## How about any days? How many? | ||
# I've made | # I've made this file available which a list of several hundred titles of Wikipedia articles about Harry Potter {{forthcoming}}.[*] I think it's all of them! Download this file, read it in, and request monthly page view data from all of them? | ||
## Once you've done this, sum up all of the page views from all of the pages and print out a TSV file with these total numbers. | ## Once you've done this, sum up all of the page views from all of the pages and print out a TSV file with these total numbers. | ||
## | ## Make a time series graph of these numbers and include a link in your notebook. | ||
== #2 Starting on your projects == | == #2 Starting on your projects == | ||
{{notice|If you are planning on collecting data | {{notice|If you are planning on collecting data, please look into using the [https://pushshift.io Pushshift API] instead of the default Reddit API. The Pushshift API is not as up-to-date but it is targeted toward data scientists, not app-makers, and is much better suited to our needs in the class.}} | ||
Many of these challenges will not involve code. Feel free to just write "markdown" code into your notebook. | |||
# Identify an API you will (or might!) want to use for your project. | # Identify an API you will (or might!) want to use for your project. | ||
# Find documentation for that API and include links | # Find documentation for that API and include links | ||
# What are the | # What are the endpoints you plan to use? What are the parameters you will need to use? | ||
# Is there a | # Is there a python module that exists that helps make contact with the API? (See if you can you find example code on how to use it). | ||
# If so, download it, install it, and import it into your notebook. | |||
# Does the API require authentication? Does it need to be approved? | # Does the API require authentication? Does it need to be approved? If so, sign up for a developer account and get your keys. | ||
# Does the API list rate limits? | |||
# Does the API list rate limits | # Make a single API call, either directly using requests or using the Python module you have used. It doesn't matter for what. The goal is that you can make technical contact. | ||
# Make a single API call, either directly using requests or using the Python module you have used. It doesn't matter for what. The goal is that you can | # '''IMPORTANT:''' If you have included any API keys in your notebook, ''make a copy of your notebook, delete the cell where you include the keys, and then upload the copy of the notebook.'' We'll show you some tricks for hiding this information going forward. | ||
# '''IMPORTANT:''' If you have included any API keys in your notebook, ''make a copy of your notebook, delete the cell where you include the keys, | |||
== Notes == | == Notes == | ||
[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook with | [*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook with that API online here. {{forthcoming}} | ||