Community Data Science Course (Spring 2023)/Week 5 lecture notes

New concepts for the day:

Stage 0: Coming up with a plan[edit]

I want to download data on page views data for three universities and present the sum total of each.

I'm going to split work into two steps:

collect the data from the web and write the raw JSON "payload" a file
read the data from the file and do whatever data extraction, cleaning, counting, etc; then write a TSV file
open a TSV file and make a graph

I want to build data on how popular something is using the MediaWiki views API. First I went searching I found two places:

I chose the second option.

The documentation suggested I should set up a unique user-agent. Search how todo that brought me to this StackOverflow post: https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python which I followed to set up headers appropriately.

Between that and the interactive material in Wikimedia Rest API, I was able to construct a URL.

We will build up something like file 1, version 1:

walk through building file 2, version 1 with a focus on:

lets build a couple functions. maybe one for dates? maybe one for getting_pageview data? lets refactor the old code to use these?
lets build in waiting for a second with time.sleep(1)
let's count with a dictionary