Editing Statistics and Statistical Programming (Fall 2020)/pset6

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
<small>[[Statistics_and_Statistical_Programming_(Fall_2020)#Week_9_.2811.2F10.2C_11.2F12.29|← Back to Week 9]]</small>
<small>[[Statistics_and_Statistical_Programming_(Fall_2020)#Week_9_.2811.2F10.2C_11.2F12.29|← Back to Week 9]]</small>


== Programming challenges (and statistical questions) ==
== Programming challenges (Part I) ==


This week's programming challenges are all about analyzing continuous data. For most of this, I'd like you to replicate some the analysis done in this paper (note: I do not think you need to read it deeply to answer the questions below):
This week's programming challenges are all about analyzing continuous data. For the first few challenges, we're going to replicate some the analysis done in this paper (note: I do not think you need to read it deeply to answer the questions below):


:: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]]
: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]]
 
Overall, the goal of this research was to understand whether/how doses of red dye number 40 affect the survival of mice (and, by extension, humans).


=== PC1. Download, import, and reshape the data ===
=== PC1. Download, import, and reshape the data ===


* Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4 (the format is not particularly "tidy"). If you look at the website with the data and/or Table 1 in the paper you should be able to figure out what each column stands for.
* Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at the website with the data and/or Table 1 in the paper you should be able to figure out what each column stands for.
* Import the data into R and get to work on reshaping the dataset. I think a good format would be a data frame with two columns: <code>group</code> and <code>weeks_alive</code>.  
* Import the data into R and get to work on reshaping the dataset. I think a good format would be a data frame with two columns: <code>group</code> and <code>weeks_alive</code>.  


=== PC2. Summarize the data ===  
=== PC2. Summarize the data ===  
Using the two columns you just created, create summary statistics and visualizations for the dataset as a whole and for each of the groups. These descriptive analyses should give you a sense of the shape of the data and relationships across groups.
Create summary statistics and visualizations for the dataset as a whole and within each group. These visualizations should both (a) give you a visual sense of the shape of the data and relationships between groups and (b) the degree to which the assumptions for t-tests and ANOVA hold.
 
==== SQ1. Discuss your descriptive analysis ====
Be sure to interpret anything noteworthy.
 
==== SQ2. State hypotheses ====
The plan here is to use ANOVA to evaluate whether there is a difference in survival time between the groups and then t-tests to compare the average survival times across some specific groups (see PC4 below for more details on which groups). State null and alternative hypotheses that correspond to these tests.
 
==== SQ3. Address assumptions for the tests ====
Identify any assumptions you may need to make to conduct the ANOVA analysis and t-tests. Do these tests seem appropriate here? Why (not)?


=== PC3. Replicate the ANOVA analysis ===
=== PC3. Replicate the ANOVA analysis ===
Estimate an ANOVA analysis using <code>aov()</code> to test the global hypothesis of a difference between the groups.
Estimate an ANOVA analysis using <code>aov()</code> to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
==== SQ4. Report and interpret your ANOVA results ====
(Note: Make sure to call <code>summary()</code> on the output of your <code>aov()</code> command.)


=== PC4. Estimate differences in means ===
=== PC4. Estimate differences in means ===
After performing an ANOVA, people sometimes do t-tests between specific groups to test/estimate differences-in-means. In this case, you should do a t-test on the average survival time of mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group.  
After performing an ANOVA analysis, people sometimes do t-tests between the groups. Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis?
 
==== SQ5. Report and interpret your t-test results ====
Make sure to include the estimated difference of means as well as the test-statistic and p-value.
 
==== SQ6. Multiple comparisons ====
Now, let's imagine that you wanted to test for differences in average survival time across all of the possible pairings of groups in the study. Should you adjust for multiple comparisons? Why (not)? If so, how would you go about it?


== Statistical questions ==
== Empirical paper questions ==
== Empirical paper questions ==


We'll continue our apparent focus on blogs with several questions about the following (very, very short) paper.
We'll continue our apparent focus on blogs with the following questions about the Sweetser and Metzgar paper.
 
::Sweetser, K. D., & Metzgar, E. (2007). Communicating during crisis: Use of blogs as a relationship management tool. ''Public Relations Review'', 33(3), 340–342. [[https://doi.org/10.1016/j.pubrev.2007.05.016 Available through NU Libraries]]


=== EQ1. Interpret the results re: RQ4 ===
=== EQ1. Interpret the results re: RQ4 ===
Line 57: Line 37:
=== EQ3. Interpret the results re: RQ6 ===
=== EQ3. Interpret the results re: RQ6 ===
Answer the same (a)-(d) questions as you did for RQs 4-5 above, but with RQ6.
Answer the same (a)-(d) questions as you did for RQs 4-5 above, but with RQ6.
== Notes ==
* Red dye study for ANOVA and t-tests
* Blogs paper for example of interpreting ANOVA
* Multiple comparisons? Implement Benjamini-Hochberg
* Some observational analysis t-test re: interpretation
* Example code: t.test(), aov(), summary(), p.adjust()
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)