Community Data Science Course (Spring 2017)/Day 7 Notes

From CommunityData
< Community Data Science Course (Spring 2017)
Revision as of 05:13, 11 May 2017 by Guyrt (talk | contribs) (Created page with "'''A bit of vocabulary: types of data''' * Categorical: data that takes a qualitative value. For instance, city, weather condition (icy, rainy, sunny). * Ordinal: data that c...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

A bit of vocabulary: types of data

  • Categorical: data that takes a qualitative value. For instance, city, weather condition (icy, rainy, sunny).
  • Ordinal: data that can be used to order things. Generally, the important thing about ordinal data is that 'scale' isn't important.
  • Cardinal: data that counts something. Examples: age, temperature, GDP.

Do this: Find two examples of each kind of variable in the SDOT crash data.

Plotting:

  1. Pick your x-axis variable.
  2. Pick your plot type.
    1. 'cardinal' and 'cardinal': Scatter
    2. 'categorical' and 'cardinal': Bar or histogram.
    3. 'categorical' and 'categorical' and 'cardinal': We'll see some examples of this in python, but a pivot table with color is often useful.
    4. 'ordinal' and 'cardinal': Bar, line chart, or histogram.
  3. Pick your y-axis variable.

Do this: Make the following plots (for each figure out which kind)

  1. Count accidents by road conditions
  2. Build a plot of number of people involved and injury count.
  3. Build a plot of accidents by date.
  4. Use a stacked plot to see if speeding is more associated with some weather conditions.

General Advice

  • Don't overthink it! Excel has dozens of fancy plots, and there is a good reason to use all them. However, don't introduce a more complicated plot where a simple one will do.