# Community Data Science Course (Sprint 2019)/Day 7 Notes

We will be discussing this data set.

• One of the most important qualities of the Scientific Revolution was that results were broadly shared, so new results could build on top of existing knowledge.
• Repeatability is the key to science (even data science): your results are only scientific if they are repeatable by a third party.

Today's Lecture Let's go end to end on a data question: are there factors that predict injuries and fatalities in automobile accidents?

• Explore the data: find missing values, identify categorical, numerical, ordinal data fields
• Transform (filter, project)
• Analyze

```import requests

url = 'https://data-seattlecitygis.opendata.arcgis.com/datasets/5b5c745e0f1f48e7a53acec63a0022ab_0.csv'
response = requests.get(url)

filehandle = open('~/Desktop/collisions.csv', 'w')
filehandle.write(response.content())
filehandle.close()
```

Opening a file is new. Note that "open" can open for reading or writing files. Be careful opening a file to write will erase that file. You can not get it back.

Explore Open the file in Excel. What columns seem to be missing sometimes?

Find a categorical, numerical, and ordinal data field.

In this section, we will read and transform the data.

Code to open a file and print the first column

```file_handle = open('sdot_collisions_seattle.csv', 'r')   # open the csv file
for line in file_handle:                                 # loop through the file one line at a time.
line_clean = line.strip()                            # remove the newline character at end of line
line_clean_list = line_clean.split(',')              # split the line into parts using split
print(line_clean_list[0])                            # print the first column of data for this row.
```

Code to open a file, select a subset of rows and columns, and write to a new file Figure out what this code does!

```file_handle = open('sdot_collisions_seattle.csv', 'r')   # open the csv file