Community Data Science Course (Spring 2017)/Day 7 Exercise

From CommunityData

The purpose of this exercise is to go "end to end" on a data problem. We're going to download it, explore in excel, use python to extract a subset and save it, then plot it in excel. If you can do this, you are well on your way to what you need for your final project!

This data describes economic conditions in each Congressional District. The details can be found at:

You are welcome to work together on this!

  1. Download the data from here.
  2. Using the data description above, see if you can figure out which columns contain which rows in the raw data. Identify the columns for construction, manufacturing, and finance workforce. Also, identify columns for median and mean income.
  3. Open the file in python, split each line, and read the fields you identified in step 2 into a list. This is a good source of help: Community_Data_Science_Course_(Spring_2017)/Day_4_Notes
  4. Remove Puerto Rico and Washington D.C.
  5. Compute the percent of workers in each of the industries above and add it to the list of data.
  6. Output the data to a new CSV file. (add a header).
  7. Open this data in Excel. Try to identify whether there is a relationship between percent of a district in each industry and median or mean salary.