Editing Statistics and Statistical Programming (Fall 2020)/pset6
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
<small>[[Statistics_and_Statistical_Programming_(Fall_2020)#Week_9_.2811.2F10.2C_11.2F12.29|← Back to Week 9]]</small> | <small>[[Statistics_and_Statistical_Programming_(Fall_2020)#Week_9_.2811.2F10.2C_11.2F12.29|← Back to Week 9]]</small> | ||
== Programming challenges ( | == Programming challenges (Part I) == | ||
This week's programming challenges are all about analyzing continuous data. For | This week's programming challenges are all about analyzing continuous data. For the first few challenges, we're going to replicate some the analysis done in this paper (note: I do not think you need to read it deeply to answer the questions below): | ||
: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]] | |||
=== PC1. Download, import, and reshape the data === | === PC1. Download, import, and reshape the data === | ||
* Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4 | * Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at the website with the data and/or Table 1 in the paper you should be able to figure out what each column stands for. | ||
* Import the data into R and get to work on reshaping the dataset. I think a good format would be a data frame with two columns: <code>group</code> and <code>weeks_alive</code>. | * Import the data into R and get to work on reshaping the dataset. I think a good format would be a data frame with two columns: <code>group</code> and <code>weeks_alive</code>. | ||
=== PC2. Summarize the data === | === PC2. Summarize the data === | ||
Create summary statistics and visualizations for the dataset as a whole and within each group. These visualizations should both (a) give you a visual sense of the shape of the data and relationships between groups and (b) the degree to which the assumptions for t-tests and ANOVA hold. | |||
=== PC3. Replicate the ANOVA analysis === | === PC3. Replicate the ANOVA analysis === | ||
Estimate an ANOVA analysis using <code>aov()</code> to | Estimate an ANOVA analysis using <code>aov()</code> to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms. | ||
=== PC4. Estimate differences in means === | === PC4. Estimate differences in means === | ||
After performing an ANOVA, people sometimes do t-tests between | After performing an ANOVA analysis, people sometimes do t-tests between the groups. Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis? | ||
== Statistical questions == | |||
== Empirical paper questions == | == Empirical paper questions == | ||
We'll continue our apparent focus on blogs with | We'll continue our apparent focus on blogs with the following questions about the Sweetser and Metzgar paper. | ||
=== EQ1. Interpret the results re: RQ4 === | === EQ1. Interpret the results re: RQ4 === | ||
Line 57: | Line 37: | ||
=== EQ3. Interpret the results re: RQ6 === | === EQ3. Interpret the results re: RQ6 === | ||
Answer the same (a)-(d) questions as you did for RQs 4-5 above, but with RQ6. | Answer the same (a)-(d) questions as you did for RQs 4-5 above, but with RQ6. | ||
== Notes == | |||
* Red dye study for ANOVA and t-tests | |||
* Blogs paper for example of interpreting ANOVA | |||
* Multiple comparisons? Implement Benjamini-Hochberg | |||
* Some observational analysis t-test re: interpretation | |||
* Example code: t.test(), aov(), summary(), p.adjust() |