Community Data Science Course (Spring 2017)/Day 7 Exercise: Difference between revisions
From CommunityData
No edit summary |
No edit summary |
||
Line 7: | Line 7: | ||
# Using the data description above, see if you can figure out which columns contain which rows in the raw data. Identify the columns for construction, manufacturing, and finance workforce. Also, identify columns for median and mean income. | # Using the data description above, see if you can figure out which columns contain which rows in the raw data. Identify the columns for construction, manufacturing, and finance workforce. Also, identify columns for median and mean income. | ||
# Open the file in python, split each line, and read the fields you identified in step 2 into a list. | # Open the file in python, split each line, and read the fields you identified in step 2 into a list. | ||
This is a good source of help: [[Community_Data_Science_Course_(Spring_2017)/Day_4_Notes]] | |||
# Remove Puerto Rico and Washington D.C. | # Remove Puerto Rico and Washington D.C. | ||
# Compute the percent of workers in each of the industries above and add it to the list of data. | # Compute the percent of workers in each of the industries above and add it to the list of data. | ||
# Output the data to a new CSV file. (add a header). | # Output the data to a new CSV file. (add a header). | ||
# Open this data in Excel. Try to identify whether there is a relationship between percent of a district in each industry and median or mean salary. | # Open this data in Excel. Try to identify whether there is a relationship between percent of a district in each industry and median or mean salary. |
Revision as of 05:06, 11 May 2017
The purpose of this exercise is to go "end to end" on a data problem. We're going to download it, explore in excel, use python to extract a subset and save it, then plot it in excel. If you can do this, you are well on your way to what you need for your final project!
This data describes economic conditions in each Congressional District. The details can be found at: http://proximityone.com/cd11414dp3.htm
- Download the data from here.
- Using the data description above, see if you can figure out which columns contain which rows in the raw data. Identify the columns for construction, manufacturing, and finance workforce. Also, identify columns for median and mean income.
- Open the file in python, split each line, and read the fields you identified in step 2 into a list.
This is a good source of help: Community_Data_Science_Course_(Spring_2017)/Day_4_Notes
- Remove Puerto Rico and Washington D.C.
- Compute the percent of workers in each of the industries above and add it to the list of data.
- Output the data to a new CSV file. (add a header).
- Open this data in Excel. Try to identify whether there is a relationship between percent of a district in each industry and median or mean salary.