Community Data Science Course (Spring 2017)/Day 7 Notes

A bit of vocabulary: types of data


 * Categorical: data that takes a qualitative value. For instance, city, weather condition (icy, rainy, sunny).
 * Ordinal: data that can be used to order things. Generally, the important thing about ordinal data is that 'scale' isn't important.
 * Cardinal: data that counts something. Examples: age, temperature, GDP.

Do this: Find two examples of each kind of variable in the SDOT crash data.

Plotting:


 * 1) Pick your x-axis variable.
 * 2) Pick your plot type.
 * 3) 'cardinal' and 'cardinal': Scatter
 * 4) 'categorical' and 'cardinal': Bar or histogram.
 * 5) 'categorical' and 'categorical' and 'cardinal': We'll see some examples of this in python, but a pivot table with color is often useful.
 * 6) 'ordinal' and 'cardinal': Bar, line chart, or histogram.
 * 7) Pick your y-axis variable.

Do this: Make the following plots (for each figure out which kind)
 * 1) Count accidents by road conditions
 * 2) Build a plot of number of people involved and injury count.
 * 3) Build a plot of accidents by date.
 * 4) Use a stacked plot to see if speeding is more associated with some weather conditions.

General Advice


 * Don't overthink it! Excel has dozens of fancy plots, and there is a good reason to use all them. However, don't introduce a more complicated plot where a simple one will do.