Statistics and Statistical Programming (Winter 2017)/R lecture outline: Week 1
From CommunityData
Lecture Outline
Intro to R and basic variables types:
- using R as a calculator:
- addition: 2 + 2
- subtraction: 2 - 3
- multiplication: 5 * 4
- division: 5/2
- more complicated stuff: use parentheses!
- powers: 2^2; 2^3
- variables
- the basic concept and how they work
- syntax for assignment: use <- (although = equals too, it's not idiomatic R)
- what makes a valid variables name: starts with a letter, contains letters and numbers; case is important; instead of spaces, use "." (not _ as in Python, although _ will usually work too)
- saving numbers to variables: cups.of.flour <- 2
- special variables built in: pi (we'll see many more)
- variables can be set to anything!
- there's also one special thing: NA (no quotes!) which means missing
- types of variables
- numeric: we've already seen, with or without the decimal point
- character: name <- "mako" (uses single or double quotes)
- logical: TRUE or FALSE (all caps)
- functions: contains parentheses right after the variable name
- functions take some input (called an argument) and provide some output (called the output or something the return value) — both are optional!
- some arguments are named (meaning that they have "foo=" or similar before them. mostly names are optional)
- the most important function: help()
- there are many built in functions including:
- sqrt()
- log()
- log1p() — super useful!
- class() — tells you what type of variable you have
- ls()
- check your reference card for many, many more
- functions take some input (called an argument) and provide some output (called the output or something the return value) — both are optional!
- vectors: you can think of a vector as like a list of things that are all the same time (lists, which will come to letter, actually refer to lists of things that might be of different types!)
- in R, all variables are vectors! although many have just one thing in them! that's why it prints out [1] next to every numbers
- you can make vectors with a special function: c(), like ages <- c(36, 4, 35)
- vectors can be of any type but they have to one type: c("mako", "mika")
- if you mix vectors together, they will be "coerced"(!)
- slicing or indexing:
- basic syntax: ages[1]; ages[2]
- more complex: ages[1:2]
- assignment through indexing: ages[1] <- 20
- most math operators operate on vectors with recycling: ages * 2; ages - 3
- vectors can names for elements! we can set those with names():
- names(ages)
- names(ages) <- c("mako", "atom", "mika")
- once we do that, we can index with names: ages["mako"]
- many functions are particularly useful on vectors with multiple elements:
- some functions return a single item: sum(); mean(); sd(); median(); var(); length()
- some return vectors: sort(); head(); range();
- some functions return other things: table(); summary()
- more advanced variables types:
- factors: for categorical data
- make with factor("mako", "mika", "mako")
- you can create factors from characters with as.factor()
- also think about: dates with POSIXct(), ordered() — really just a type of factor for ordinal data
- factors: for categorical data
- using logical vectors to index and recode data:
- comparison operators will return logical variables: rivers > 300; rivers < 300; rivers <= 320; rivers == 210; rivers != 210
- indexing with logicals: rivers[rivers > 300]
- recoding data: my.rivers <- rivers; rivers[rivers < 300] <- NA
- basic plotting and visualization:
- boxplot() — boxplots
- hist() — draw histograms
- density() — density plots
- comments: lines that start with # (or anything after a #)
- installing new pacakges and loading new datasets:
- the simplest way is with load()
- install.packages("UsingR")
- install.packages("openintro")
- library(UsingR) no quotes!
- other sources of help:
- built in documentation
- StackOverflow
- R reference card