Editing Community Data Science Course (Spring 2023)/Week 7 coding challenges

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 6: Line 6:
# As is always the case, spend some time poking around the website and reading documentation to get a sense of what kind of data this is, where it coming from, who generated it, and so on.
# As is always the case, spend some time poking around the website and reading documentation to get a sense of what kind of data this is, where it coming from, who generated it, and so on.
# Put the CSV file into a directory and create a new Jupyter notebook in the same directory (remember that it is comma-separated, not tab-separated). Load that into Python as a pandas DataFrame.
# Put the CSV file into a directory and create a new Jupyter notebook in the same directory (remember that it is comma-separated, not tab-separated). Load that into Python as a pandas DataFrame.
# Show some parts of the dataframe and make sure your load command worked.
# Show some parts of dataframe and make sure thing have worked.
# Print out the number of rows and the number of columns to get a sense of how much data you're working with.
# Print out the number of rows and the number of columns to get a sense of how much data you're working with.


Line 12: Line 12:


# Take a look at the "RecordType" column which describes the kinds of complaints that come in. What are the types of categories? How many are in each category? Show both with numbers and with a simple visualization (a histogram, perhaps?). For each category, print out the "Description" of several examples. What kinds of things are included?
# Take a look at the "RecordType" column which describes the kinds of complaints that come in. What are the types of categories? How many are in each category? Show both with numbers and with a simple visualization (a histogram, perhaps?). For each category, print out the "Description" of several examples. What kinds of things are included?
# Build a new dataset that includes only the "RecordType", "OriginalZip", and "Description" columns.
# Build a new second dataset that includes only the "RecordType" and "OriginalZip" columns.
# Use this second dataset to filter the dataset down to just rows from your zipcode. If you don't live in Seattle, you can just use my zip code (98112) which covers north Capitol Hill and Montlake or you can pick an area you think is interesting from [https://www.usmapguide.com/washington/seattle-zip-code-map/ this map].
# Use this second dataset to filter the dataset down to just rows from your zipcode. If you don't live in Seattle, you can just use my zip code (98112) which covers north Capitol Hill and Montlake or you can pick an area you think is interesting from [https://www.usmapguide.com/washington/seattle-zip-code-map/ this map].
## Now look at the number and proportion of different types of records in this subset.
## Now look at the number and proportion of different types of records in this subset.
## Be ready to explain if the distribution in this zipcode different than the distribution in Seattle overall? If not, how is it different?
## Be ready to explain if the distribution in this zipcode different than the distribution in Seattle overall? If not, how is it different?
## Once again, print out the "Description" of several examples from each category. What kinds of things are included?
## Once again, print out the "Description" of several examples from each category. What kinds of things are included?
# Use pandas to write out the three-column dataset to TSV (with ''tabs'' instead of commas).
# Use pandas to write out the two-column dataset to TSV (with ''tabs'' instead of commas).


== It's about time ==
== It's about time ==


First, lets return to the full dataset and not the two column subset.
First, lets return to the full dataset and not the the two column subset.


# Create a new timeseries (use a pandas Series) that contains zip code and that uses the "OpenDate" column as the index. Be sure to check the type of the "OpenDate" column and make sure it's in the pandas datetime format.
# Create a new timeseries pandas Series that contains zip code and that use the "OpenDate" column as the index. Be sure to check the type of "OriginalZip" column and make sure it's in the pandas datetime format.
# Use the <code>.resample()</code> function associated with your pandas time series so that it shows the number of complaints per week overall and visualize this with a time series plot.
# Use the function <code>.resample()</code> function associated with your pandas time series so that is shows the number of complaints per week overall and visualizes this with a time series.


== You've got questions, you've got answers ==
== You've got questions, you've got answers ==
Line 31: Line 31:
# Explicitly state the question
# Explicitly state the question
# Include the pandas code to answer it
# Include the pandas code to answer it
# Write a sentence or two explaining what you found and interpret the finding.
# Write a sentence or two explaining what you found and interpret the finding for you.
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)