Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Community Data Science Course (Spring 2023)/Week 5 lecture notes
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
New concepts for the day: * Defining functions * <code>import json</code> and <code>json.loads()</code> and <code>json.dumps()</code> * Reading ''from'' files * Breaking projects in multiple notebooks and step * Waiting... <code>time.sleep(1)</code> == Stage 0: Coming up with a plan == I want to download data on page views data for three universities and present the sum total of each. I'm going to split work into two steps: * collect the data from the web and write the raw JSON "payload" a file * read the data from the file and do whatever data extraction, cleaning, counting, etc; then write a TSV file * open a TSV file and make a graph == Stage 1: Getting data == I want to build data on how popular something is using the MediaWiki views API. First I went [https://www.google.com/search?q=wikipedia%20page%20view%20api searching] I found two places: * [https://www.mediawiki.org/wiki/API:Query MediaWiki API] * [https://www.mediawiki.org/wiki/Wikimedia_REST_API Wikimedia Rest API] I chose the second option. The documentation suggested I should set up a unique user-agent. Search how todo that brought me to this StackOverflow post: https://stackoverflow.com/questions/10606133/sending-user-agent-using-requests-library-in-python which I followed to set up headers appropriately. Between that and the interactive material in [https://www.mediawiki.org/wiki/Wikimedia_REST_API Wikimedia Rest API], I was able to construct a URL. We will '''build up something like file 1, version 1''': * setting the header * json.dumps() [mention that I'll skip this until we have an error] == Stage 2: Reading in data == walk through building '''file 2, version 1''' with a focus on: * opening files with <code>open(filename, 'r')</code> * <code>f.read()</code> which reads the whole file in * json.loads() * outputting days and views * try to graph... we'll have an error when we try to graph * write some new code to create better formatted date strings... == Stages 3 and 4: lets extend to multiple things == * lets build a couple functions. maybe one for dates? maybe one for getting_pageview data? lets refactor the old code to use these? * lets build in waiting for a second with <code>time.sleep(1)</code> * let's count with a dictionary
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information