Editing Community Data Science Course (Spring 2016)/Day 4 Notes

'''We will be discussing [https://data.seattle.gov/Transportation/SDOT-Collisions/v7k9-7dn4 this data set].'''

* One of the most important qualities of the Scientific Revolution was that results were broadly shared, so new results could build on top of existing knowledge.
* Repeatability is the key to science (even data science): your results are only scientific if they are repeatable by a third party.

'''Today's Lecture'''
Let's go end to end on a data question: are there factors that predict injuries and fatalities in automobile accidents?
* Download data
* Explore the data: find missing values, identify categorical, numerical, ordinal data fields
* Transform (filter, project)
* Analyze (see todo for prompts)

# Find data. Let's start at [https://data.seattle.gov Seattle Data].
## brief aside: Socrata
# Download it. 
# Write exploratory scripts
## Using <code>open</code> to open a file in python.
# Write transformation script
# In groups, answer the todo prompts.
@@ Line 9: / Line 9: @@
 * Explore the data: find missing values, identify categorical, numerical, ordinal data fields
 * Transform (filter, project)
-* Analyze
+* Analyze (see todo for prompts)
 # Find data. Let's start at [https://data.seattle.gov Seattle Data].
@@ Line 16: / Line 16: @@
 # Write exploratory scripts
 ## Using <code>open</code> to open a file in python.
-## In groups, explore one of these questions by building a histogram with a python dictionary:
+# Write transformation script
-### What kinds of values occur in <code>COLLISSIONTYPE</code>?
+# In groups, answer the todo prompts.
-### What kinds of values occur in <code>ADDRTYPE</code>?
-### What kinds of values occur in <code>JUNCTIONTYPE</code>?
-### What kinds of values occur in <code>SDOT_COLDESC</code>?
-### What kinds of values occur in <code>WEATHER</code>?
-### What kinds of values occur in <code>SEVERITYDESC</code>?
-### (Challenge) Make a histogram of collisions by day in the data. Notice anything odd?
-# Write transformation script and make a conclusion. You can work in groups. Example conclusions:
-## Are incidents involving pedestrians or cyclists more likely to result in fatalities?
-## Are incidents more likely to occur on rainy or wet conditions?
-'''Code to open a file'''
- file_handle = open('sdot_collisions_seattle.csv', 'r')   # open the csv file
- for line in file_handle:                                 # loop through the file one line at a time.
-     line_clean = line.strip()                            # remove the newline character at end of line
-     line_clean_list = line_clean.split(',')              # split the line into parts using split
-     print(line_clean_list[0])                            # print the first column of data for this row.
-'''Code to open a file, select a subset of rows and columns, and write to a new file'''
- file_handle = open('sdot_collisions_seattle.csv', 'r')   # open the csv file
- header = file_handle.readline()
- output_handle = open('sdot_collisitions_transformed.csv', 'w')    # NOTE this will overwrite
- for line in file_handle:                                 # loop through the file one line at a time.
-     line_clean = line.strip()                            # remove the newline character at end of line
-     line_clean_list = line_clean.split(',')              # split the line into parts using split
-     if int(line_clean[8]) > 0:                           # If the integer value in columns 8 is greater than one then...
-         output_handle.write(line)                        # write that line to the output.
- output_handle.close()                                    # Close the output file after the loop.