Community Data Science Course (Spring 2023)/Week 5 coding challenges: Difference between revisions

From CommunityData
(Created page with "There's actually nothing to download this time so you simply start with a fresh Jupyter notebook! Be sure to give a nice descriptive name, as always. Although there's nothing to download, you will likely want to look at the following resources when working through the first half of these these: * ../Week 5 lecture notes * The [Week 5 lecture notebook] * The [Week 5 lecture video] == #1 Wikipedia Page View API == # Identify a famous person that you are interested...")
 
Line 17: Line 17:
## Where their any months was this true? How many and which ones?
## Where their any months was this true? How many and which ones?
## How about any days? How many?
## How about any days? How many?
# I've made this file available which a list of several hundred titles of Wikipedia articles about Harry Potter {{forthcoming}}. Can you download this file, read it in, and request monthly page view data from all of them?
# I've made this file available which a list of several hundred titles of Wikipedia articles about Harry Potter {{forthcoming}}.[*] Can you download this file, read it in, and request monthly page view data from all of them?
## Once you've done this, sum up all of the page views from all of the pages and print out a TSV file with these total numbers.
## Once you've done this, sum up all of the page views from all of the pages and print out a TSV file with these total numbers.
## Make a time series graph of these numbers and include a link in your notebook.
## Make a time series graph of these numbers and include a link in your notebook.
== Notes ==
[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook  with that API online here. {{forthcoming}}

Revision as of 01:49, 25 April 2023

There's actually nothing to download this time so you simply start with a fresh Jupyter notebook! Be sure to give a nice descriptive name, as always.

Although there's nothing to download, you will likely want to look at the following resources when working through the first half of these these:

#1 Wikipedia Page View API

  1. Identify a famous person that you are interested in and collect page view data on that person. Generate a time-series visualization and include a link to it in your notebook.
  2. Identify 2 other languages editions of Wikipedia that have articles on that person. Collect page view data on the article in other languages and create a single visualization that shows how the dynamics and similar and/or different.
  3. Collect page view data on Marvel Comics and DC Comics in Wikipedia. (If you'd rather replace these examples with some other comparison of popular rivals, that's fine.)
    1. Which has more total page views in 2022?
    2. Can you draw a visualization of this?
    3. Where there years since 2015 when the less viewed page was viewed more? How many and which ones?
    4. Where their any months was this true? How many and which ones?
    5. How about any days? How many?
  4. I've made this file available which a list of several hundred titles of Wikipedia articles about Harry Potter [Forthcoming].[*] Can you download this file, read it in, and request monthly page view data from all of them?
    1. Once you've done this, sum up all of the page views from all of the pages and print out a TSV file with these total numbers.
    2. Make a time series graph of these numbers and include a link in your notebook.


Notes

[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook with that API online here. [Forthcoming]