Latest revision |
Your text |
Line 9: |
Line 9: |
| * Explore the data: find missing values, identify categorical, numerical, ordinal data fields | | * Explore the data: find missing values, identify categorical, numerical, ordinal data fields |
| * Transform (filter, project) | | * Transform (filter, project) |
| * Analyze | | * Analyze (see todo for prompts) |
|
| |
|
| # Find data. Let's start at [https://data.seattle.gov Seattle Data]. | | # Find data. Let's start at [https://data.seattle.gov Seattle Data]. |
Line 16: |
Line 16: |
| # Write exploratory scripts | | # Write exploratory scripts |
| ## Using <code>open</code> to open a file in python. | | ## Using <code>open</code> to open a file in python. |
| ## In groups, explore one of these questions by building a histogram with a python dictionary:
| | # Write transformation script |
| ### What kinds of values occur in <code>COLLISSIONTYPE</code>?
| | # In groups, answer the todo prompts. |
| ### What kinds of values occur in <code>ADDRTYPE</code>?
| |
| ### What kinds of values occur in <code>JUNCTIONTYPE</code>?
| |
| ### What kinds of values occur in <code>SDOT_COLDESC</code>?
| |
| ### What kinds of values occur in <code>WEATHER</code>?
| |
| ### What kinds of values occur in <code>SEVERITYDESC</code>?
| |
| ### (Challenge) Make a histogram of collisions by day in the data. Notice anything odd?
| |
| # Write transformation script and make a conclusion. You can work in groups. Example conclusions: | |
| ## Are incidents involving pedestrians or cyclists more likely to result in fatalities? | |
| ## Are incidents more likely to occur on rainy or wet conditions?
| |
| | |
| | |
| | |
| '''Code to open a file'''
| |
|
| |
| file_handle = open('sdot_collisions_seattle.csv', 'r') # open the csv file
| |
| for line in file_handle: # loop through the file one line at a time.
| |
| line_clean = line.strip() # remove the newline character at end of line
| |
| line_clean_list = line_clean.split(',') # split the line into parts using split
| |
| print(line_clean_list[0]) # print the first column of data for this row.
| |
| | |
| | |
| '''Code to open a file, select a subset of rows and columns, and write to a new file'''
| |
|
| |
| file_handle = open('sdot_collisions_seattle.csv', 'r') # open the csv file
| |
| header = file_handle.readline()
| |
| output_handle = open('sdot_collisitions_transformed.csv', 'w') # NOTE this will overwrite
| |
| for line in file_handle: # loop through the file one line at a time.
| |
| line_clean = line.strip() # remove the newline character at end of line
| |
| line_clean_list = line_clean.split(',') # split the line into parts using split
| |
| if int(line_clean[8]) > 0: # If the integer value in columns 8 is greater than one then...
| |
| output_handle.write(line) # write that line to the output.
| |
| | |
| output_handle.close() # Close the output file after the loop.
| |