Editing Community Data Science Course (Spring 2023)/Week 5 coding challenges

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 4: Line 4:


* [[../Week 5 lecture notes]]
* [[../Week 5 lecture notes]]
* Notebooks from the Week 5 lecture including:
* Notebooks from the Week 5 lecture notebook includding:
** [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/week_5_lecture_part_1-data_collection.ipynb Week 5 lecture notebook part 1] - Data Collection
** [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/week_5_lecture_part_1-data_collection.ipynbWeek 5 lecture notebook part 1] - Data Collection
** [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/week_5_lecture_part_2-data_processing.ipynb Week 5 lecture notebook part 2] - Data Processing
** [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/week_5_lecture_part_2-data_processing.ipynb Week 5 lecture notebook part 2] - Data Processing
** [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/week-5-lecture_pre-baked_workthrough-20230424.ipynb Week 5 lecture notebook (prebaked)] — A combination with the three notebooks above with versions of the code that I wrote as notes for myself before the class.  
** [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/week-5-lecture_pre-baked_workthrough-20230424.ipynb Week 5 lecture notebook (prebaked)] — A combination with three version of the code that I wrote as notes for myself before the class.  
* The [https://uw.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=8133566f-217d-4e45-b3c9-afef0152528b Week 5 lecture video]
* The [https://uw.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=8133566f-217d-4e45-b3c9-afef0152528b Week 5 lecture video]


Line 17: Line 17:
## Which has more total page views in 2022?
## Which has more total page views in 2022?
## Can you draw a visualization in a spreadsheet that shows this? (Again, provide a link.)
## Can you draw a visualization in a spreadsheet that shows this? (Again, provide a link.)
## Were there any years when 2022's more popular page was instead the less popular of the two? How many and which ones?
## Where there years since 2015 when the less viewed page was viewed more? How many and which ones?
## Were there any months was this reversal of relative popularity occurred? How many and which ones?
## Where their any months was this true? How many and which ones?
## How about any days? How many?
## How about any days? How many?
# I've made [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/list_of_washington_alternative_rocks_bands_wikipedia-2023-04-25.jsonl this file available] which includes list of more than 100 Wikipedia articles about alternative rock bands from Washington state that I built from [https://en.wikipedia.org/wiki/Category:Alternative_rock_groups_from_Washington_(state) this category in Wikipedia].[*] It's a <code>.jsonl</code> file. Download the file (click "raw" and then save the file onto your drive). Now read it in, and request monthly page view data from all of them. If you need some help with loading it in, I've included some sample code at the bottom of this page.
# I've made [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/list_of_washington_alternative_rocks_bands_wikipedia-2023-04-25.jsonl this file available] which includes list of more than 100 Wikipedia articles about alternative rock bands from Washington state that I built from [https://en.wikipedia.org/wiki/Category:Alternative_rock_groups_from_Washington_(state) this category in Wikipedia].[*] It's a <code>.jsonl</code> file. Download the file (click "raw" and then save the file onto your drive). Now read it in, and request monthly page view data from all of them. If you need some help with loading it in, I've included some sample code at the bottom of this page.
Line 26: Line 26:
== #2 Starting on your projects ==
== #2 Starting on your projects ==


{{notice|If you are planning on collecting data from Reddit, please look into using the [https://pushshift.io Pushshift API] instead of the default Reddit API. The Pushshift API is not as up-to-date but it is targeted toward data scientists, not app-makers, and is likely much better suited to our needs in the class. That said, take a look at both!}}
{{notice|If you are planning on collecting data from Reddit, please look into using the [https://pushshift.io Pushshift API] instead of the default Reddit API. The Pushshift API is not as up-to-date but it is targeted toward data scientists, not app-makers, and is much better suited to our needs in the class.}}


In this section, you will take your first steps towards working with your project API. Many of these questions will not involve code, so just mark down your answers in cells in your notebook.  
In this section, you will take your first steps towards working with your project API. Many of these questions will not involve code, so just mark down your answers in cells in your notebook.  
Line 48: Line 48:
[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook  with the code to grab that data from [https://petscan.wmflabs.org/ the PetScan API] [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/get_washington_alternative_rock_bands_list-20230425.ipynb in the form of this Github notebook].
[*] You will probably not be shocked to hear that I collected this data from an API! I've included a Jupyter Notebook  with the code to grab that data from [https://petscan.wmflabs.org/ the PetScan API] [https://github.com/kayleachampion/spr23_CDSW/blob/main/curriculum/week5/get_washington_alternative_rock_bands_list-20230425.ipynb in the form of this Github notebook].


If you just want to read it in the file, remember it's just a JSONL file so you can modify the code from the lecture and it should work (e.g., something with <code>open()</code> and the <code>.readlines()</code> function associated with file variables.
If you just want to read it in the file, remember it's just a JSONL file so you can modify the code from the lecture and it should work. Here's code that loads it in and prints out each line:
 
<syntaxhighlight lang="python">
with open("list_of_washington_alternative_rocks_bands_wikipedia-2023-04-25.jsonl", 'r') as input_file:
    for line in input_file.readlines():
        line_dict = json.loads(line)
        print(line_dict['page_title'])
</syntaxhighlight>
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)

Template used on this page: