Statistics and Statistical Programming (Fall 2020)/pset6
Programming challenges
We're going to evaluate and replicate the analysis done in this paper:
- Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. Journal of the National Cancer Institute, 66(1), 197–212. [PDF]
PC1. Download, import, and reshape the data
- Download the dataset by clicking through on the "Red Dye Number 40" link on this webpage. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at the website with the data and/or Table 1 in the paper you should be able to figure out what each column stands for.
- Import the data into R and get to work on reshaping the dataset. I think a good format would be a data frame with two columns:
group
andweeks_alive
.
PC2. Summarize the data
Create summary statistics and visualizations for each group. These visualizations should both (a) give you a visual sense of the shape of the data and relationships between groups and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
PC3. Replicate the ANOVA analysis
Estimate an ANOVA analysis using aov()
to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
PC4. Estimate differences in means
After performing an ANOVA analysis, people sometimes do t-tests between the groups. Do a t-test between mice with none RD40 and mice with any (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis?
Statistical questions
Empirical paper questions
We'll continue our apparent focus on blogs with the following questions about the Sweetser and Metzgar paper.
EQ1. Interpret the results re: RQ4
(a) What is the unit of analysis? What is the dependent variable? The independent variable? What are the levels or groups of being compared in the ANOVA? (b) Clearly State the null hypothesis being tested. What is the alternative hypothesis? (c) Summarize or restate the results in statistical terms. Explain what these results mean in substantive terms. (d) How convincing do you find these results? What should we be taking away?
EQ2. Interpret the results re: RQ5
Answer the same (a)-(d) questions as you did for RQ4 above, but with RQ5.
EQ3. Interpret the results re: RQ6
Answer the same (a)-(d) questions as you did for RQs 4-5 above, but with RQ6.
Notes
- Red dye study for ANOVA and t-tests
- Blogs paper for example of interpreting ANOVA
- Multiple comparisons? Implement Benjamini-Hochberg
- Some observational analysis t-test re: interpretation
- Example code: t.test(), aov(), summary(), p.adjust()
- Data dino and/or anscombe's quartet