Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Community Data Science Course (Spring 2023)/Week 7 coding challenges
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
This week the coding challenges are limited to playing around with data in Pandas. Although it's likely that you could do some of these in Excel, I want you to do these in Pandas. == Getting started with pandas and Seattle "complaint" data == # Go to [https://data.seattle.gov data.seattle.gov] and download [https://data.seattle.gov/Community/Code-Complaints-and-Violations/ez4a-iug7 the dataset on code complaints and violations]. (It is, by the way, also available in an API!) In this case, just go to Export→CSV and download the file. It should be about 80 megabytes. # As is always the case, spend some time poking around the website and reading documentation to get a sense of what kind of data this is, where it coming from, who generated it, and so on. # Put the CSV file into a directory and create a new Jupyter notebook in the same directory (remember that it is comma-separated, not tab-separated). Load that into Python as a pandas DataFrame. # Show some parts of the dataframe and make sure your load command worked. # Print out the number of rows and the number of columns to get a sense of how much data you're working with. == You know the type == # Take a look at the "RecordType" column which describes the kinds of complaints that come in. What are the types of categories? How many are in each category? Show both with numbers and with a simple visualization (a histogram, perhaps?). For each category, print out the "Description" of several examples. What kinds of things are included? # Build a new dataset that includes only the "RecordType", "OriginalZip", and "Description" columns. # Use this second dataset to filter the dataset down to just rows from your zipcode. If you don't live in Seattle, you can just use my zip code (98112) which covers north Capitol Hill and Montlake or you can pick an area you think is interesting from [https://www.usmapguide.com/washington/seattle-zip-code-map/ this map]. ## Now look at the number and proportion of different types of records in this subset. ## Be ready to explain if the distribution in this zipcode different than the distribution in Seattle overall? If not, how is it different? ## Once again, print out the "Description" of several examples from each category. What kinds of things are included? # Use pandas to write out the three-column dataset to TSV (with ''tabs'' instead of commas). == It's about time == First, lets return to the full dataset and not the two column subset. # Create a new timeseries (use a pandas Series) that contains zip code and that uses the "OpenDate" column as the index. Be sure to check the type of the "OpenDate" column and make sure it's in the pandas datetime format. # Use the <code>.resample()</code> function associated with your pandas time series so that it shows the number of complaints per week overall and visualize this with a time series plot. == You've got questions, you've got answers == Ask and answer a question not on this list using this data. Be sure to: # Explicitly state the question # Include the pandas code to answer it # Write a sentence or two explaining what you found and interpret the finding.
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information