Community Data Science Course (Spring 2016)/Day 4 Notes

From CommunityData
< Community Data Science Course (Spring 2016)
Revision as of 07:06, 20 April 2017 by Guyrt (talk | contribs) (Created page with "'''We will be discussing [https://data.seattle.gov/Transportation/SDOT-Collisions/v7k9-7dn4 this data set].''' * One of the most important qualities of the Scientific Revolut...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

We will be discussing this data set.

  • One of the most important qualities of the Scientific Revolution was that results were broadly shared, so new results could build on top of existing knowledge.
  • Repeatability is the key to science (even data science): your results are only scientific if they are repeatable by a third party.

Today's Lecture Let's go end to end on a data question: are there factors that predict injuries and fatalities in automobile accidents?

  • Download data
  • Explore the data: find missing values, identify categorical, numerical, ordinal data fields
  • Transform (filter, project)
  • Analyze (see todo for prompts)
  1. Find data. Let's start at Seattle Data.
    1. brief aside: Socrata
  2. Download it.
  3. Write exploratory scripts
    1. Using open to open a file in python.
  4. Write transformation script
  5. In groups, answer the todo prompts.