Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 2: Difference between revisions
From CommunityData
No edit summary |
No edit summary |
||
(6 intermediate revisions by the same user not shown) | |||
Line 6: | Line 6: | ||
*** make with factor("mako", "mika", "mako") | *** make with factor("mako", "mika", "mako") | ||
*** you can create factors from characters with as.factor() | *** you can create factors from characters with as.factor() | ||
* lists: like vecotrs but can contains objects of any kind | * lists: like vecotrs but can contains objects of any kind | ||
** lets say we have two vectors: short.rivers (rivers * 0.5) and normal.rivers (rivers) | ** lets say we have two vectors: short.rivers (rivers * 0.5) and normal.rivers (rivers) | ||
Line 16: | Line 13: | ||
** index recursively: rivers.list$short.rivers[1] | ** index recursively: rivers.list$short.rivers[1] | ||
** some function work on lists: boxplot(rivers.list); some don't: hist(rivers) | ** some function work on lists: boxplot(rivers.list); some don't: hist(rivers) | ||
* data.frames: | * matrix: lets create the table from the homework as a matrix | ||
** create from vectors: start with 1:9, then add real numbers: matrix(x, ncol=3) | |||
* data.frames: ''the'' most important data structure in R. we will be using them '''constantly''' | |||
**lets explore the faithful data.frame first | **lets explore the faithful data.frame first | ||
*** head(faithful); colnames(faithful); nrow(faithful); ncol(faithful) | *** head(faithful); colnames(faithful); nrow(faithful); ncol(faithful) | ||
Line 31: | Line 30: | ||
** removing lines, columns d[1,] <- NULL | ** removing lines, columns d[1,] <- NULL | ||
** recoding/transforming data: lets log a column | ** recoding/transforming data: lets log a column | ||
** changing types (lets turn a number into a factor) (e.g., gear) | |||
** creating subsets of new data.frames using logical vectors | ** creating subsets of new data.frames using logical vectors | ||
* useful functions with data.frames: | * useful functions with data.frames: | ||
Line 39: | Line 39: | ||
** apply: more complicated, but can be very useful with matrixes | ** apply: more complicated, but can be very useful with matrixes | ||
* graphing with ggplot2: this is what I use so it's what we'll use moving forward | * graphing with ggplot2: this is what I use so it's what we'll use moving forward | ||
** lets just play around with examples from | ** first, install the package and load it with install.packages() and library() | ||
** lets just play around with examples from mtcars | |||
** philosophy: a graphics grammar. you start out by using ggplot | |||
** ggplot(data=mtcars) + aes(x=hp, y=mpg, color=gear, size=carb) + geom_point() | |||
* read data from a CSV file: read.csv(); read.delim() can be useful as well! options can be helpful! | |||
** library foreign can be very helpful: read.dta(); read.sav(); etc |
Latest revision as of 21:49, 17 January 2017
material not covered last week we'll want to cover this time:
- adding comments: lines that start with # (or anything after a #)
- more advanced variables types:
- factors: for categorical data
- make with factor("mako", "mika", "mako")
- you can create factors from characters with as.factor()
- factors: for categorical data
- lists: like vecotrs but can contains objects of any kind
- lets say we have two vectors: short.rivers (rivers * 0.5) and normal.rivers (rivers)
- construct lists: rivers.list <- list(normal.rivers, short.rivers)
- named lists: list(foo=foo, bar=bar), or add names with names()
- index into lists: use double square brackets like rivers.list1, otherwise they work like lists
- index recursively: rivers.list$short.rivers[1]
- some function work on lists: boxplot(rivers.list); some don't: hist(rivers)
- matrix: lets create the table from the homework as a matrix
- create from vectors: start with 1:9, then add real numbers: matrix(x, ncol=3)
- data.frames: the most important data structure in R. we will be using them constantly
- lets explore the faithful data.frame first
- head(faithful); colnames(faithful); nrow(faithful); ncol(faithful)
- work with the columns faithful$eruptions and faithful$waiting (mean, boxplot, hist)
- but the real power is doing bivariate analysis: plot()
- dataframes can have more than one columns: mtcars
- indexing by numbers: faithful[1,]; faithful[,2]; faithful[1,2], faithful[1:2, 2:3], etc
- lets explore the faithful data.frame first
- how do we plot things in that space? we use the formula "~" symbol
- plot(var1 ~ var2, data=dataframe); boxplot works too
- making/modifying new dataframes: lets work on a copy of mtcars (call it mako.cars)
- several ways: data.frame() is the basic one:
- modification/building up: cbind(); rbind(); as.data.frame()
- modifying values: d[1,2] <- NA
- removing lines, columns d[1,] <- NULL
- recoding/transforming data: lets log a column
- changing types (lets turn a number into a factor) (e.g., gear)
- creating subsets of new data.frames using logical vectors
- useful functions with data.frames:
- is.na()
- complete.cases()
- apply functions: super, useful!
- sapply, lapply: lets work on the rivers dataset
- apply: more complicated, but can be very useful with matrixes
- graphing with ggplot2: this is what I use so it's what we'll use moving forward
- first, install the package and load it with install.packages() and library()
- lets just play around with examples from mtcars
- philosophy: a graphics grammar. you start out by using ggplot
- ggplot(data=mtcars) + aes(x=hp, y=mpg, color=gear, size=carb) + geom_point()
- read data from a CSV file: read.csv(); read.delim() can be useful as well! options can be helpful!
- library foreign can be very helpful: read.dta(); read.sav(); etc