Community Data Science Course (Spring 2023)/Week 5 lecture notes: Difference between revisions
(Created page with "New concepts for the day: * Defining functions * <code>import json</code> and <code>json.loads()</code> and <code>json.dumps()</code> * Reading *from* files * Breaking projects in multiple notebooks and step * Waiting... == Stage 0: Coming up with a plan == * I'm going to split work into two steps, one is basically == Stage 1: Getting data == I want to build data on how popular something is using the MediaWiki views API. First I went [https://www.google.com/search?...") |
No edit summary |
||
Line 3: | Line 3: | ||
* Defining functions | * Defining functions | ||
* <code>import json</code> and <code>json.loads()</code> and <code>json.dumps()</code> | * <code>import json</code> and <code>json.loads()</code> and <code>json.dumps()</code> | ||
* Reading | * Reading ''from'' files | ||
* Breaking projects in multiple notebooks and step | * Breaking projects in multiple notebooks and step | ||
* Waiting... | * Waiting... <code>time.sleep(1)</code> | ||
== Stage 0: Coming up with a plan == | == Stage 0: Coming up with a plan == | ||
I want to download data on page views data for three universities and present the sum total of each. | |||
I'm going to split work into two steps: | |||
* collect the data from the web and write the raw JSON "payload" a file | |||
* read the data from the file and do whatever data extraction, cleaning, counting, etc; then write a TSV file | |||
* open a TSV file and make a graph | |||
== Stage 1: Getting data == | == Stage 1: Getting data == | ||
Line 23: | Line 29: | ||
Between that and the interactive material in [https://www.mediawiki.org/wiki/Wikimedia_REST_API Wikimedia Rest API], I was able to construct a URL. | Between that and the interactive material in [https://www.mediawiki.org/wiki/Wikimedia_REST_API Wikimedia Rest API], I was able to construct a URL. | ||
We will '''build up something like file 1, version 1''': | |||
* setting the header | |||
* json.dumps() [mention that I'll skip this until we have an error] | |||
== Stage 2: Reading in data == | |||
walk through building '''file 2, version 1''' with a focus on: | |||
* opening files with <code>open(filename, 'r')</code> | |||
* <code>f.read()</code> which reads the whole file in | |||
* json.loads() | |||
* outputting days and views | |||
* try to graph... we'll have an error when we try to graph | |||
* write some new code to create better formatted date strings... | |||
== Stage 3: lets extend to multiple things == | |||
* lets build a couple functions. maybe one for dates? maybe one for getting_pageview data? lets refactor the old code to use these? | |||
* lets build in waiting for a second with <code>time.sleep(1)</code> | |||
* let's count with a dictionary |
Revision as of 01:34, 25 April 2023
New concepts for the day:
- Defining functions
import json
andjson.loads()
andjson.dumps()
- Reading from files
- Breaking projects in multiple notebooks and step
- Waiting...
time.sleep(1)
Stage 0: Coming up with a plan
I want to download data on page views data for three universities and present the sum total of each.
I'm going to split work into two steps:
- collect the data from the web and write the raw JSON "payload" a file
- read the data from the file and do whatever data extraction, cleaning, counting, etc; then write a TSV file
- open a TSV file and make a graph
Stage 1: Getting data
I want to build data on how popular something is using the MediaWiki views API. First I went searching I found two places:
I chose the second option.
The documentation suggested I should set up a unique user-agent. Search how todo that brought me to this StackOverflow post: https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python which I followed to set up headers appropriately.
Between that and the interactive material in Wikimedia Rest API, I was able to construct a URL.
We will build up something like file 1, version 1:
- setting the header
- json.dumps() [mention that I'll skip this until we have an error]
Stage 2: Reading in data
walk through building file 2, version 1 with a focus on:
- opening files with
open(filename, 'r')
f.read()
which reads the whole file in- json.loads()
- outputting days and views
- try to graph... we'll have an error when we try to graph
- write some new code to create better formatted date strings...
Stage 3: lets extend to multiple things
- lets build a couple functions. maybe one for dates? maybe one for getting_pageview data? lets refactor the old code to use these?
- lets build in waiting for a second with
time.sleep(1)
- let's count with a dictionary