Community Data Science Course (Spring 2017)/Day 7 Notes
From CommunityData
A bit of vocabulary: types of data
- Categorical: data that takes a qualitative value. For instance, city, weather condition (icy, rainy, sunny).
- Ordinal: data that can be used to order things. Generally, the important thing about ordinal data is that 'scale' isn't important.
- Cardinal: data that counts something. Examples: age, temperature, GDP.
Do this: Find two examples of each kind of variable in the SDOT crash data.
Plotting:
- Pick your x-axis variable.
- Pick your plot type.
- 'cardinal' and 'cardinal': Scatter
- 'categorical' and 'cardinal': Bar or histogram.
- 'categorical' and 'categorical' and 'cardinal': We'll see some examples of this in python, but a pivot table with color is often useful.
- 'ordinal' and 'cardinal': Bar, line chart, or histogram.
- Pick your y-axis variable.
Do this: Make the following plots (for each figure out which kind)
- Count accidents by road conditions
- Build a plot of number of people involved and injury count.
- Build a plot of accidents by date.
- Use a stacked plot to see if speeding is more associated with some weather conditions.
General Advice
- Don't overthink it! Excel has dozens of fancy plots, and there is a good reason to use all them. However, don't introduce a more complicated plot where a simple one will do.