Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 6: Difference between revisions
From CommunityData
No edit summary |
|||
Line 1: | Line 1: | ||
== Programming Challenges == | == Programming Challenges == | ||
: '''PC0.''' | Let's re-evaluate some data from this paper: | ||
: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]] | |||
I found a copy of the dataset [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html at this link]. | |||
: '''PC0.''' Download the dataset from [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html from this webpage]. You'll find that the it's not in an ideal setup. It's an Excel files (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular. | |||
: '''PC1.''' Load the data. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, time of death (i.e., lifespan). | |||
: '''PC2.''' Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable? | |||
: '''PC3.''' Do a t-test between mice with ''any'' RD40 and mice with at least a small amount. Run a t-test between the group with a high dosage and control group. | |||
: '''PC4.''' Run an anova using aov() to see if there is a difference between the groups. | |||
== Statistical Questions from OpenIntro §6 == | == Statistical Questions from OpenIntro §6 == |
Revision as of 03:00, 3 February 2017
Programming Challenges
Let's re-evaluate some data from this paper:
- Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. Journal of the National Cancer Institute, 66(1), 197–212. [PDF]
I found a copy of the dataset at this link.
- PC0. Download the dataset from from this webpage. You'll find that the it's not in an ideal setup. It's an Excel files (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular.
- PC1. Load the data. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, time of death (i.e., lifespan).
- PC2. Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
- PC3. Do a t-test between mice with any RD40 and mice with at least a small amount. Run a t-test between the group with a high dosage and control group.
- PC4. Run an anova using aov() to see if there is a difference between the groups.
Statistical Questions from OpenIntro §6
- Q0. Any questions or clarifications from the OpenIntro text or lecture notes?
- Q1. Exercise 6.12 on public opinion about cannabis legalization
- Q2. Exercise 6.20 a continuation of 6.12
- Q3. Exercise 6.38 on translating a problem in English into statistical tests
- Q4. Exercise 6.50 another voter/public opinion question
Questions on the Empirical Paper
Let's just go back to the Buechley and Hill paper on LilyPad Arduino:
- Q5. For Study 1, lets focus on the statistical test:
- (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
- (b) What is the null hypothesis being tested? What is the alternative hypothesis?
- (c) Summarize ore restate results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
- (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test?
- Q6. Do the same as above but for Study 2.