Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 7


 * correlations
 * cor: works with two variables, or with more!
 * cor(method="spearman") is useful if you have non-normally distributed data because it is simply rank correlations)
 * fitting a linear model with one variable: lm
 * module formulae, which we've already seen!
 * looking at model objects: summary; m$ or names(m)
 * m$fitted.values; m$residuals
 * also functions: coefficients(m) (or coef), predict(m), residuals(m) (or resid); confint(m)
 * we can also do these by hand:
 * residuals: mtcars$mpg - m$fitted.values
 * confint: est + 1.96 * c(-1, 1) * se
 * plotting residuals:
 * hist(residuals(m))
 * plot against our x: plot(mtcars$hp, residuals(m)
 * QQ-plots with qqnorm(residuals(m))
 * doing a plot with ggplot just involves making a dataset: d.fig <- data.frame(hp=mtcars$hp, resids=residuals(m))
 * adding controls: just make our formula more complex
 * update.formula
 * or just a write a new one
 * adding logical variables: no problem!
 * adding categorical variables: no problem! (I'll explain interpretation later, but i want you to see that this works!)
 * generating nice regression plots:
 * one of many options: stargazer(m1, m2, type="text") or type="html"
 * interpreting linear models with anova — i'm not going to walk through the details but the important thing to keep in mind is that although the statistics are different, the p-values are identical!