Statistics and Statistical Programming (Fall 2020)/pset1

From CommunityData
< Statistics and Statistical Programming (Fall 2020)
Revision as of 01:06, 21 September 2020 by Aaronshaw (talk | contribs) (Created page with "<small>← Back to Week 3</small> For this problem set, the programming challenges ask yo...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

← Back to Week 3

For this problem set, the programming challenges ask you to apply some of the concepts from OpenIntro Chapters 1 and 2 using the R and RStudio fundamentals that were introduced in the recommended R tutorials for Weeks 1 and 3. If you're feeling lost at any point, I recommend you review some of those materials and/or come ask for help!

The topics/skills covered here include: loading/importing datasets, performing some basic data management, summarization, and arithmetic operations, calculating summary/descriptive statistics for a few different kinds of variables, and creating univariate tables and visualizations. Also, the problem set is structured to model the sort of workflow you might pursue whenever you encounter a new dataset, starting with basic inspection and description of variables of interest before moving on to more sophisticated analysis.

Programming Challenges

Working with a dataset provided by R

PC0. Open up RStudio, create a new file for this assignment (likely an R Markdown script), add relevant metadata (maybe your name, the date, and a title so that you/we know it is Problem Set 1 for this class?), and save it.
PC1. Install and call the openintro package so that it's available to you. Call the counties dataset so that it is available to you. Find out the type or class of the dataset.
PC. Get to know your data a bit! Find out how many rows and how many columns are in the counties dataset. Find the names for all of the variables (columns).
PC2. Calculate the range (minimum and maximum) and mean for at least one continuous or discrete numeric variable in the dataset.
PC3. Create a tabular summary for at least one categorical variable in the dataset.
PC4. Plot a visual summary (maybe a boxplot or a histogram?) for at least one numeric variable in the dataset.


Working with a dataset from the course website =