DS4UX (Spring 2016)/Day 4 coding challenge: Difference between revisions

From CommunityData
(Created page with "Each of the challenges this week will ask you to modify and work with code in the Wikipedia API projects...")
 
No edit summary
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
Each of the challenges this week will ask you to modify and work with code in the [[Community Data Science Course (Spring 2015)/Wikipedia API projects|Wikipedia API projects]] which you should have installed and begun working with in class.
Each of the challenges this week will ask you to modify and work with code in the [[DS4UX_(Spring_2016)/Seattle_traffic|Seattle traffic project]] which you should have installed and begun working with in class.


As always, it's not essential that you solve or get through all of these — I'm not grading your answers on these. That said, being able to work through at least many of them is a good sign that you have mastered the concepts for the week. It is always fine to collaborate or work together on these problem sets. The only thing I ask is that you do not broadcast answers before Sunday at midnight on Canvas.
This week, you will be required to attempt the first 4 challenges in this list, and upload your solution scripts via Canvas. You will NOT be graded on whether your solutions are correct, efficient, or even functional—just on whether you turn in an attempt at a solution that shows you tried. You WILL be marked down if you don't submit your solutions—so be sure to spend time attempting these challenges!


== Challenges ==
You do NOT need to complete and turn in your answers to the bonus challenges (#5 and #6). You will not be graded on these. But if you do attempt them, I'd love to see your solutions!


# Save the revision metadata printed in <code>wikipedia1-2.py</code>  (i.e., the material already being printed out) to a file called "wikipedia_revisions.tsv".
Being able to work through at least many of these challenges is a very good sign that you have mastered the important Python concepts we've covered so far. As always, it is fine to collaborate or work together on these problem sets, as long as you submit your solutions separately. And this week, please don't broadcast your responses via Canvas before Sunday night.
# Print out the revision ids and edit summaries (i.e., <code>comment</code>) of each revision for the article on Python.
# Find out what other data or metadata you can print out for a revision for an article.
# Which article is in more categories? [[:wiki:Python (programming language)|Python (programming language)]] or [[:wiki:R (programming language)|R (programming language)]]? 
# Find out how many revisions to the article on "Python (programming language)" were made by user "Peterl"? How about "Hfastedge"?
# How would you use the API to find out how many revisions/edits the user "Benjamin Mako Hill" has made to Wikipedia?
# Can you build a list of all of the articles edited by "Benjamin Mako Hill"? What is the article with the longest title that user Benjamin Mako Hill has edited?
# How many edits to the article "Python (programming language)" where made in 2014?


;Here's a much more complicated challenge but a fun one that you know enough to solve: Check out the game [http://kevan.org/catfishing.php Catfishing] which shows you categories and has you guess an article. Write a version that uses the Wikipedia API. For example, pick 5 articles and write a program that will randomly show the categories for one of those articles and to ask you to guess the article. Read the guess with <code>input()</code> and let the user know if they go it right or wrong!
== Challenges (required) ==
 
:1. What day between January 1, 2014 and March 31, 2016 saw the most total traffic on the Burke-Gilman trail? 
:2. What was the busiest hour of ''any'' day for northbound bike traffic? How about southbound pedestrian traffic?
:3. How much southbound traffic does the Burke-Gilman get, on average, during Morning commute hours? How much does it get during evening commute hours?
::*''Note: for this question, assume morning commute hours start at 7am and end at 10am, and that evening commute hours start at 4pm and end at 7pm. You can also consider these to be hours to be "commute hours" even on weekends, since our data doesn't contain days of the week.''
:4. Write a program that generates a CSV file called <code>march_2016_daily_ped_counts.csv</code> of the daily north- and south-bound pedestrian counts for March 2016, in chronological order. Your file should contain column headers and it should be possible to open it in a spreadsheet program like Microsoft Excel or Google Sheets.
 
 
=== Bonus challenges (optional) ===
You don't need to complete these to get a "complete" grade for this assignment, but you should attempt them anyway!
 
:5. What day of the week does the Burke Gilman experience the most overall traffic, on average? The most pedestrian traffic?
::*''Hint: it will help to know that January 1, 2014 was a Wednesday!''
:6. Which gets more inbound bike traffic per day, on average—the Burke-Gilman trail or the Mountain to Sound trail?
::*By 'inbound' I mean ''towards'' central Seattle. For BGT, inbound means southbound, for MTS, inbound means westbound.
::*To answer this question, you will need to alter <code>bgt_traffic.py</code> so that it takes BOTH <code>bgt_bike_and_peds.csv</code> and <code>mts_bike_and_peds.csv</code> as input, and converts them into TWO separate dictionaries. <code>mts_bike_and_peds.csv</code> is also included in the <code>bgt-traffic.zip</code> file you downloaded for this week's challenges.
::*The two files are in the same format, BUT there are a few differences that you need to account for (Hint: make sure to check the column titles and the order of the rows in each dataset!)
 
=== Solutions ===
<big>'''[http://jtmorgan.net/ds4ux/week4/bgt-traffic-solutions.zip Click here to download the solutions to this week's coding challenges]'''</big>


[[Category:DS4UX (Spring 2016)]]
[[Category:DS4UX (Spring 2016)]]

Latest revision as of 18:41, 25 April 2016

Each of the challenges this week will ask you to modify and work with code in the Seattle traffic project which you should have installed and begun working with in class.

This week, you will be required to attempt the first 4 challenges in this list, and upload your solution scripts via Canvas. You will NOT be graded on whether your solutions are correct, efficient, or even functional—just on whether you turn in an attempt at a solution that shows you tried. You WILL be marked down if you don't submit your solutions—so be sure to spend time attempting these challenges!

You do NOT need to complete and turn in your answers to the bonus challenges (#5 and #6). You will not be graded on these. But if you do attempt them, I'd love to see your solutions!

Being able to work through at least many of these challenges is a very good sign that you have mastered the important Python concepts we've covered so far. As always, it is fine to collaborate or work together on these problem sets, as long as you submit your solutions separately. And this week, please don't broadcast your responses via Canvas before Sunday night.

Challenges (required)[edit]

1. What day between January 1, 2014 and March 31, 2016 saw the most total traffic on the Burke-Gilman trail?
2. What was the busiest hour of any day for northbound bike traffic? How about southbound pedestrian traffic?
3. How much southbound traffic does the Burke-Gilman get, on average, during Morning commute hours? How much does it get during evening commute hours?
  • Note: for this question, assume morning commute hours start at 7am and end at 10am, and that evening commute hours start at 4pm and end at 7pm. You can also consider these to be hours to be "commute hours" even on weekends, since our data doesn't contain days of the week.
4. Write a program that generates a CSV file called march_2016_daily_ped_counts.csv of the daily north- and south-bound pedestrian counts for March 2016, in chronological order. Your file should contain column headers and it should be possible to open it in a spreadsheet program like Microsoft Excel or Google Sheets.


Bonus challenges (optional)[edit]

You don't need to complete these to get a "complete" grade for this assignment, but you should attempt them anyway!

5. What day of the week does the Burke Gilman experience the most overall traffic, on average? The most pedestrian traffic?
  • Hint: it will help to know that January 1, 2014 was a Wednesday!
6. Which gets more inbound bike traffic per day, on average—the Burke-Gilman trail or the Mountain to Sound trail?
  • By 'inbound' I mean towards central Seattle. For BGT, inbound means southbound, for MTS, inbound means westbound.
  • To answer this question, you will need to alter bgt_traffic.py so that it takes BOTH bgt_bike_and_peds.csv and mts_bike_and_peds.csv as input, and converts them into TWO separate dictionaries. mts_bike_and_peds.csv is also included in the bgt-traffic.zip file you downloaded for this week's challenges.
  • The two files are in the same format, BUT there are a few differences that you need to account for (Hint: make sure to check the column titles and the order of the rows in each dataset!)

Solutions[edit]

Click here to download the solutions to this week's coding challenges