Community Data Science Course (Spring 2017)/Day 7 Exercise: Difference between revisions
From CommunityData
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
The purpose of this exercise is to go "end to end" on a data problem. We're going to download it, explore in excel, use python to extract a subset and save it, then plot it in excel. If you can do this, you are well on your way to what you need for your final project! | The purpose of this exercise is to go "end to end" on a data problem. We're going to download it, explore in excel, use python to extract a subset and save it, then plot it in excel. If you can do this, you are well on your way to what you need for your final project! | ||
# Download the data from [http://proximityone.com/ | This data describes economic conditions in each Congressional District. The details can be found at: | ||
http://proximityone.com/cd11414dp3.htm | |||
# Download the data from [http://proximityone.com/countytrends/cd11414dp3.csv here]. | |||
# Using the data description above, see if you can figure out which columns contain which rows in the raw data. Identify the columns for construction, manufacturing, and finance workforce. Also, identify columns for median and mean income. | |||
# Open the file in python, split each line, and read the fields you identified in step 2 into a list. | |||
# Remove Puerto Rico and Washington D.C. | |||
# Compute the percent of workers in each of the industries above and add it to the list of data. | |||
# Output the data to a new CSV file. (add a header). | |||
# Open this data in Excel. Try to identify whether there is a relationship between percent of a district in each industry and median or mean salary. |
Revision as of 05:05, 11 May 2017
The purpose of this exercise is to go "end to end" on a data problem. We're going to download it, explore in excel, use python to extract a subset and save it, then plot it in excel. If you can do this, you are well on your way to what you need for your final project!
This data describes economic conditions in each Congressional District. The details can be found at: http://proximityone.com/cd11414dp3.htm
- Download the data from here.
- Using the data description above, see if you can figure out which columns contain which rows in the raw data. Identify the columns for construction, manufacturing, and finance workforce. Also, identify columns for median and mean income.
- Open the file in python, split each line, and read the fields you identified in step 2 into a list.
- Remove Puerto Rico and Washington D.C.
- Compute the percent of workers in each of the industries above and add it to the list of data.
- Output the data to a new CSV file. (add a header).
- Open this data in Excel. Try to identify whether there is a relationship between percent of a district in each industry and median or mean salary.