Community Data Science Course (Spring 2023)/Week 5 coding challenges: Difference between revisions
From CommunityData
Line 21: | Line 21: | ||
## Make a time series graph of these numbers and include a link in your notebook. | ## Make a time series graph of these numbers and include a link in your notebook. | ||
== #2 Starting on your projects == | |||
{{notice|If you are planning on collecting data, please look into using the [https://pushshift.io Pushshift API] instead of the default Reddit one. The Pushshift API is not as up-to-date but it is targetted toward data scientists, not app-makers, and is much better suited to our needs in the class. | |||
== Notes == | == Notes == | ||
[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook with that API online here. {{forthcoming}} | [*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook with that API online here. {{forthcoming}} |
Revision as of 23:50, 24 April 2023
There's actually nothing to download this time so you simply start with a fresh Jupyter notebook! Be sure to give a nice descriptive name, as always.
Although there's nothing to download, you will likely want to look at the following resources when working through the first half of these these:
- Community Data Science Course (Spring 2023)/Week 5 lecture notes
- The [Week 5 lecture notebook]
- The [Week 5 lecture video]
#1 Wikipedia Page View API
- Identify a famous person that you are interested in and collect page view data on that person. Generate a time-series visualization and include a link to it in your notebook.
- Identify 2 other languages editions of Wikipedia that have articles on that person. Collect page view data on the article in other languages and create a single visualization that shows how the dynamics and similar and/or different.
- Collect page view data on Marvel Comics and DC Comics in Wikipedia. (If you'd rather replace these examples with some other comparison of popular rivals, that's fine.)
- Which has more total page views in 2022?
- Can you draw a visualization of this?
- Where there years since 2015 when the less viewed page was viewed more? How many and which ones?
- Where their any months was this true? How many and which ones?
- How about any days? How many?
- I've made this file available which a list of several hundred titles of Wikipedia articles about Harry Potter [Forthcoming].[*] Can you download this file, read it in, and request monthly page view data from all of them?
- Once you've done this, sum up all of the page views from all of the pages and print out a TSV file with these total numbers.
- Make a time series graph of these numbers and include a link in your notebook.
#2 Starting on your projects
{{notice|If you are planning on collecting data, please look into using the Pushshift API instead of the default Reddit one. The Pushshift API is not as up-to-date but it is targetted toward data scientists, not app-makers, and is much better suited to our needs in the class.
Notes
[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook with that API online here. [Forthcoming]