Editing Community Data Science Course (Spring 2023)/Week 5 lecture notes

New concepts for the day:

* Defining functions
* <code>import json</code> and <code>json.loads()</code> and <code>json.dumps()</code>
* Reading ''from'' files
* Breaking projects in multiple notebooks and step
* Waiting... <code>time.sleep(1)</code>

== Stage 0: Coming up with a plan ==

I want to download data on page views data for three universities and present the sum total of each.

I'm going to split work into two steps:

* collect the data from the web and write the raw JSON "payload" a file
* read the data from the file and do whatever data extraction, cleaning, counting, etc; then write a TSV file
* open a TSV file and make a graph

== Stage 1: Getting data ==

I want to build data on how popular something is using the MediaWiki views API. First I went [https://www.google.com/search?q=wikipedia%20page%20view%20api searching] I found two places:

* [https://www.mediawiki.org/wiki/API:Query MediaWiki API]
* [https://www.mediawiki.org/wiki/Wikimedia_REST_API Wikimedia Rest API]

I chose the second option.

The documentation suggested I should set up a unique user-agent. Search how todo that brought me to  this StackOverflow post: https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python which I followed to set up headers appropriately.

Between that and the interactive material in [https://www.mediawiki.org/wiki/Wikimedia_REST_API Wikimedia Rest API], I was able to construct a URL.

We will '''build up something like file 1, version 1''':

* setting the header
* json.dumps() [mention that I'll skip this until we have an error]

== Stage 2: Reading in data ==

walk through building '''file 2, version 1''' with a focus on:

* opening files with <code>open(filename, 'r')</code>
* <code>f.read()</code> which reads the whole file in
* json.loads()
* outputting days and views
* try to graph... we'll have an error when we try to graph
* write some new code to create better formatted date strings...

== Stages 3 and 4: lets extend to multiple things ==

* lets build a couple functions. maybe one for dates? maybe one for getting_pageview data? lets refactor the old code to use these?
* lets build in waiting for a second with <code>time.sleep(1)</code>
* let's count with a dictionary