Difference between revisions of "Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 6"

From CommunityData
Jump to navigation Jump to search
(Draft for Week 6)
 
Line 6: Line 6:
  
 
: '''PC0.''' Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at Table 1 in the paper you should be able to figure out what each column stands for.
 
: '''PC0.''' Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at Table 1 in the paper you should be able to figure out what each column stands for.
: '''PC1.''' Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, time of death (i.e., lifespan).  
+
: '''PC1.''' Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, weeks_alive.  
 
: '''PC2.''' Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
 
: '''PC2.''' Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
: '''PC3.''' Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms.
+
: '''PC3.''' Estimate an ANOVA analysis using aov() to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
: '''PC4.''' Estimate an ANOVA analysis using aov() to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
+
: '''PC4.''' Often after performing an ANOVA analysis, people do t-tests between the groups. Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis?
  
 
== Statistical Questions from OpenIntro §6 ==
 
== Statistical Questions from OpenIntro §6 ==
Line 19: Line 19:
 
: '''Q4.''' Exercise 6.50 another voter/public opinion question
 
: '''Q4.''' Exercise 6.50 another voter/public opinion question
  
<!-- == Questions on the Empirical Paper ==
+
== Questions on the Empirical Paper ==
  
Let's just go back to the Buechley and Hill paper on LilyPad Arduino:
+
These questions are for the Buechley and Hill paper on LilyPad Arduino:
  
 
: '''Q5.''' For ''Study 1'', lets focus on the statistical test:
 
: '''Q5.''' For ''Study 1'', lets focus on the statistical test:
 
:* (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
 
:* (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
 
:* (b) What is the null hypothesis being tested? What is the alternative hypothesis?  
 
:* (b) What is the null hypothesis being tested? What is the alternative hypothesis?  
:* (c) Summarize ore restate results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
+
:* (c) Summarize or restate the results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
:* (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test?
+
:* (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test? How do you decide when to use each statistical procedure?
: '''Q6.''' Do the same as above but for ''Study 2''.
 
 
 
== Questions on Gelman and Loken ==
 
 
 
: '''Q7.''' Be ready to summarize the main point of, and share some reflections on, the paper. There are no specific questions.
 
-->
 

Revision as of 16:21, 25 April 2019

Programming Challenges

We're going to evaluate and replicate the analysis done in this paper:

Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. Journal of the National Cancer Institute, 66(1), 197–212. [PDF]
PC0. Download the dataset by clicking through on the "Red Dye Number 40" link on this webpage. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. If you look at Table 1 in the paper you should be able to figure out what each column stands for.
PC1. Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, weeks_alive.
PC2. Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
PC3. Estimate an ANOVA analysis using aov() to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
PC4. Often after performing an ANOVA analysis, people do t-tests between the groups. Do a t-test between mice with none RD40 and mice with any (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms. How should you interpret p-values if you do these tests after an ANOVA analysis?

Statistical Questions from OpenIntro §6

Q0. Any questions or clarifications from the OpenIntro text or lecture notes?
Q1. Exercise 6.12 on public opinion about cannabis legalization
Q2. Exercise 6.20 a continuation of 6.12
Q3. Exercise 6.38 on translating a problem in English into statistical tests
Q4. Exercise 6.50 another voter/public opinion question

Questions on the Empirical Paper

These questions are for the Buechley and Hill paper on LilyPad Arduino:

Q5. For Study 1, lets focus on the statistical test:
  • (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
  • (b) What is the null hypothesis being tested? What is the alternative hypothesis?
  • (c) Summarize or restate the results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
  • (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test? How do you decide when to use each statistical procedure?