Community Data Science Course (Spring 2023)/Week 5 lecture notes

New concepts for the day:


 * Defining functions
 * and  and
 * Reading from files
 * Breaking projects in multiple notebooks and step
 * Waiting...

Stage 0: Coming up with a plan
I want to download data on page views data for three universities and present the sum total of each.

I'm going to split work into two steps:


 * collect the data from the web and write the raw JSON "payload" a file
 * read the data from the file and do whatever data extraction, cleaning, counting, etc; then write a TSV file
 * open a TSV file and make a graph

Stage 1: Getting data
I want to build data on how popular something is using the MediaWiki views API. First I went searching I found two places:


 * MediaWiki API
 * Wikimedia Rest API

I chose the second option.

The documentation suggested I should set up a unique user-agent. Search how todo that brought me to this StackOverflow post: https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python which I followed to set up headers appropriately.

Between that and the interactive material in Wikimedia Rest API, I was able to construct a URL.

We will build up something like file 1, version 1:


 * setting the header
 * json.dumps [mention that I'll skip this until we have an error]

Stage 2: Reading in data
walk through building file 2, version 1 with a focus on:


 * opening files with
 * which reads the whole file in
 * json.loads
 * outputting days and views
 * try to graph... we'll have an error when we try to graph
 * write some new code to create better formatted date strings...

Stages 3 and 4: lets extend to multiple things

 * lets build a couple functions. maybe one for dates? maybe one for getting_pageview data? lets refactor the old code to use these?
 * lets build in waiting for a second with
 * let's count with a dictionary