Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2

material not covered last week we'll want to cover this time:


 * adding comments: lines that start with # (or anything after a #)
 * more advanced variables types:
 * factors: for categorical data
 * make with factor("mako", "mika", "mako")
 * you can create factors from characters with as.factor
 * lists: like vecotrs but can contains objects of any kind
 * lets say we have two vectors: short.rivers (rivers * 0.5) and normal.rivers (rivers)
 * construct lists: rivers.list <- list(normal.rivers, short.rivers)
 * named lists: list(foo=foo, bar=bar), or add names with names
 * index into lists: use double square brackets like rivers.list1, otherwise they work like lists
 * index recursively: rivers.list$short.rivers[1]
 * some function work on lists: boxplot(rivers.list); some don't: hist(rivers)
 * matrix: lets create the table from the homework as a matrix
 * create from vectors: start with 1:9, then add real numbers: matrix(x, ncol=3)
 * data.frames: the most important data structure in R. we will be using them constantly
 * lets explore the faithful data.frame first
 * head(faithful); colnames(faithful); nrow(faithful); ncol(faithful)
 * work with the columns faithful$eruptions and faithful$waiting (mean, boxplot, hist)
 * but the real power is doing bivariate analysis: plot
 * dataframes can have more than one columns: mtcars
 * indexing by numbers: faithful[1,]; faithful[,2]; faithful[1,2], faithful[1:2, 2:3], etc
 * how do we plot things in that space? we use the formula "~" symbol
 * plot(var1 ~ var2, data=dataframe); boxplot works too
 * making/modifying new dataframes: lets work on a copy of mtcars (call it mako.cars)
 * several ways: data.frame is the basic one:
 * modification/building up: cbind; rbind; as.data.frame
 * modifying values: d[1,2] <- NA
 * removing lines, columns d[1,] <- NULL
 * recoding/transforming data: lets log a column
 * changing types (lets turn a number into a factor) (e.g., gear)
 * creating subsets of new data.frames using logical vectors
 * useful functions with data.frames:
 * is.na
 * complete.cases
 * apply functions: super, useful!
 * sapply, lapply: lets work on the rivers dataset
 * apply: more complicated, but can be very useful with matrixes
 * graphing with ggplot2: this is what I use so it's what we'll use moving forward
 * first, install the package and load it with install.packages and library
 * lets just play around with examples from mtcars
 * philosophy: a graphics grammar. you start out by using ggplot
 * ggplot(data=mtcars) + aes(x=hp, y=mpg, color=gear, size=carb) + geom_point
 * read data from a CSV file: read.csv; read.delim can be useful as well! options can be helpful!
 * library foreign can be very helpful: read.dta; read.sav; etc