Editing Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 5: Line 5:
: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]]
: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]]


: '''PC0.''' Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at the website with the data and/or Table 1 in the paper you should be able to figure out what each column stands for.
: '''PC0.''' Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at Table 1 in the paper you should be able to figure out what each column stands for.
: '''PC1.''' Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: <code>group</code> and <code>weeks_alive</code>.  
: '''PC1.''' Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, weeks_alive.  
: '''PC2.''' Create summary statistics and visualizations for each group. These visualizations should both (a) give you a visual sense of the shape of the data and relationships between groups and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
: '''PC2.''' Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
: '''PC3.''' Estimate an ANOVA analysis using <code>aov()</code> to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
: '''PC3.''' Estimate an ANOVA analysis using aov() to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
: '''PC4.''' After performing an ANOVA analysis, people sometimes do t-tests between the groups. Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis?
: '''PC4.''' Often after performing an ANOVA analysis, people do t-tests between the groups. Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis?


== Statistical Questions from OpenIntro §6 ==
== Statistical Questions from OpenIntro §6 ==
Line 20: Line 20:


== Empirical Paper Questions ==
== Empirical Paper Questions ==
: '''EQ0.''' Any questions about the Buechley and Hill paper or the Reinhart reading?


These questions are for the Buechley and Hill paper on LilyPad Arduino:
These questions are for the Buechley and Hill paper on LilyPad Arduino:


: '''EQ1.''' For ''Study 1'', let's focus on the statistical test:
: '''EQ0.''' For ''Study 1'', let's focus on the statistical test:
:: (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
:: (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
:: (b) Why not just summarize the results, like we did in week 2? Why bother with the statistical test? How do you decide when to use each statistical procedure?
:: (b) What is the null hypothesis being tested? What is the alternative hypothesis?  
:: (c) What is the null hypothesis being tested? What is the alternative hypothesis?  
:: (c) Summarize or restate the results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
:: (d) Summarize or restate the results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
:: (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test? How do you decide when to use each statistical procedure?
 


Now go back to the Shaw and Benkler paper from a few weeks ago:
Now go back to the Shaw and Benkler paper from a few weeks ago:


: '''EQ2.''' Using the data from Table 3 and Figure 2:
: '''EQ1.''' Using the data from Table 3 and Figure 2:
:: (a) What statistical procedure produced the p-value in Table 3? What is the null hypothesis being tested?
:: (a) What statistical procedure produced the p-value in Table 3? What is the null hypothesis being tested?
:: (b) How convincing do you find these results? What are some reasons to be skeptical?
:: (b) How convincing do you find these results? What are some reasons to be skeptical?
:: (c) Reproduce Figure 2 using the data in Table 3. One thing that's missing from that figure is error bars. Based on the reading from Reinhart §5, what type of error bars do you think you should use? Figure out how to calculate the error bars and figure out if they overlap? What does that tell us? (Bonus: figure out how to add error bars to the bar plot)
:: (c) Reproduce Figure 2 using the data in Table 3, this time adding error bars. Based on the reading from Reinhart §5, what type of error bars do you think you should use? Do they overlap? What does that tell us?
:: (d) Should we be concerned about the base rate fallacy described in Reinhart §4? Why or why not?
:: (d) Should we be concerned about the base rate fallacy described in Reinhart §4? Why or why not?
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)