Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 6
From CommunityData
Programming Challenges
Let's re-evaluate some data from this paper:
- Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. Journal of the National Cancer Institute, 66(1), 197–212. [PDF]
I found a copy of the dataset at this link.
- PC0. Download the dataset from from this webpage. You'll find that the it's not in an ideal setup. It's an Excel files (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. Take a look.
- PC1. Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, time of death (i.e., lifespan).
- PC2. Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
- PC3. Do a t-test between mice with any RD40 and mice with at least a small amount. Run a t-test between the group with a high dosage and control group. How would you go about doing itusing formula notation? Be ready to report, interpret, and discuss the results in substantive terms.
- PC4. Run an anova using aov() to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
Statistical Questions from OpenIntro §6
- Q0. Any questions or clarifications from the OpenIntro text or lecture notes?
- Q1. Exercise 6.12 on public opinion about cannabis legalization
- Q2. Exercise 6.20 a continuation of 6.12
- Q3. Exercise 6.38 on translating a problem in English into statistical tests
- Q4. Exercise 6.50 another voter/public opinion question
Questions on the Empirical Paper
Let's just go back to the Buechley and Hill paper on LilyPad Arduino:
- Q5. For Study 1, lets focus on the statistical test:
- (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
- (b) What is the null hypothesis being tested? What is the alternative hypothesis?
- (c) Summarize ore restate results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
- (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test?
- Q6. Do the same as above but for Study 2.
Questions on Gelman and Loken
- Q7. Be ready to summarize the main point of, and share some reflections on, the paper. There are no specific questions.