Editing Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 6

== Programming Challenges ==

: '''PC0.''' I've provided the full dataset from which I drew each of your samples in a TSV file in the directory <code>week_05</code> in [https://github.com/makoshark/uwcom521-assignments/ class assignment git repository]. These are ''tab delimited'', not comma delimited. TSV, is related to CSV and is also a common format. Go ahead and load it into R (''HINT: <code>read.delim()</code>''). Take the mean of the variable <code>x</code> in that dataset. That is the true population mean — the thing we have been creating estimates of in week 2 and week 3.

== Statistical Questions from OpenIntro §6 ==

: '''Q0.''' Any questions or clarifications from the OpenIntro text or lecture notes?
: '''Q1.''' Exercise 6.12 on public opinion about cannabis legalization
: '''Q2.''' Exercise 6.20 a continuation of 6.12 
: '''Q3.''' Exercise 6.38 on translating a problem in English into statistical tests
: '''Q4.''' Exercise 6.50 another voter/public opinion question

== Questions on the Empirical Paper ==

Let's just go back to the Buechley and Hill paper on LilyPad Arduino:

: '''Q5.''' For ''Study 1'', lets focus on the statistical test:
:* (a) What is the unit of analysis? What is the dependent variable? The independent variable? What are groups being compared in the test? Is it a one-way or two-way design?
:* (b) What is the null hypothesis being tested? What is the alternative hypothesis? 
:* (c) Summarize ore restate results in statistical terms. Explain what these results mean in substantive terms? How convincing do you find these results? What should we be taking away?
:* (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test?
: '''Q6.''' Do the same as above but for ''Study 2''.
@@ Line 1: / Line 1: @@
 == Programming Challenges ==
-Let's re-evaluate some data from this paper:
+: '''PC0.''' I've provided the full dataset from which I drew each of your samples in a TSV file in the directory <code>week_05</code> in [https://github.com/makoshark/uwcom521-assignments/ class assignment git repository]. These are ''tab delimited'', not comma delimited. TSV, is related to CSV and is also a common format. Go ahead and load it into R (''HINT: <code>read.delim()</code>''). Take the mean of the variable <code>x</code> in that dataset. That is the true population mean — the thing we have been creating estimates of in week 2 and week 3.
-: Lagakos, S., & Mosteller, F. (1981). A case study of statistics in the regulatory process: the FD&C Red No. 40 experiments. ''Journal of the National Cancer Institute'', 66(1), 197–212. [[https://www.gwern.net/docs/statistics/1981-lagakos.pdf PDF]]
-: '''PC0.''' Download the dataset by clicking through on the "Red Dye Number 40" link on [http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/owan/frames/frame.html this webpage]. You'll find that the it's not in an ideal setup. It's an Excel file (XLS) with a series of columns labeled X1.. X4. The format is not exactly tabular.
-: '''PC1.''' Load the data into R. Now get to work on reshaping the dataset. I think a good format would be a data frame with two columns: group, time of death (i.e., lifespan).
-: '''PC2.''' Create summary statistics and visualizations for each group. Write code that allows you to generate a useful way to both (a) get a visual sense both for the shape of the data and its relationships and (b) the degree to which the assumptions for t-tests and ANOVA hold. What is the global mean of your dependent variable?
-: '''PC3.''' Do a t-test between mice with ''none'' RD40 and mice with ''any'' (i.e., at least a small amount). Next, run a t-test between the group with a high dosage and control group. How would you go about doing it using formula notation? Be ready to report, interpret, and discuss the results in substantive terms.
-: '''PC4.''' Estimate an ANOVA analysis using aov() to see if there is a difference between the groups. Be ready to report, interpret, and discuss the results in substantive terms.
 == Statistical Questions from OpenIntro §6 ==
@@ Line 29: / Line 21: @@
 :* (d) Why weren't we happy just leaving it where we did in week 2? Why bother with the statistical test?
 : '''Q6.''' Do the same as above but for ''Study 2''.
-== Questions on Gelman and Loken ==
-: '''Q7.''' Be ready to summarize the main point of, and share some reflections on, the paper. There are no specific questions.