# Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 8

## Programming Challenges

The first set of programming challenges will once again use your dataset from the week 3 problem set:

PC0. Load up your dataset as you did in Week 3 PC1.
PC1. Refamiliarize yourself with the data and recode your variables as you did for Week 3 PC8. You may recall from that x and y looked like they might be related. We now have the tools and terminology to describe this relationship and to estimate just how related they are.
PC2. Run a t.test between x and y in the dataset and be prepared to interpret the results.
PC3. Estimate how correlated x and y are with each other.
PC4. Fit a linear model corresponding to the following formula and be ready to interpret the coefficients, standard errors, t-statistics, p-values, and ${\displaystyle \mathrm {R} ^{2}}$ of it:
${\displaystyle {\hat {y}}=\beta _{0}+\beta _{1}x+\varepsilon }$
PC5. Generate the following diagnostic plots and be prepared to explain (a) what issue(s) and/or assumptions each one can help you evaluate; (b) what conclusions you draw from them:
(a) A histogram of the residuals.
(b) Plot the residuals by your values of x.
(c) A QQ plot.
PC6. Generate a nice looking publication-ready table with the fitted model formatted as raw text, HTML, or LaTeX.

Now, lets go back to the Michelle Obama dataset we used last week as part of the week 7 problem set.

PC7. Load up the full dataset and fit the following linear model. Be ready to interpret the results in the same way you did for PC4 above:
${\displaystyle {\widehat {\mathrm {fruit} }}=\beta _{0}+\beta _{1}\mathrm {obama} +\varepsilon }$
PC8. Examine the residuals for your model in and try to interpret these as you did in PC4 above. What do you notice? (Note: treat the dichotomous measures as continuous for the moment. We'll discuss the implications of that in class.)
PC9. Run the model on three subsets of the dataset: just 2012, 2014, and 2015. Be prepared to talk through the results.

## Statistics Questions

SQ0. Any questions or clarifications from the PSU material or the OpenIntro text?
SQ1. Exercise 8.14 on evaluating regression residuals (no sub-parts)
SQ2. Exercise 8.16 on Challenger o-rings.
SQ3. Exercise 8.18 which is more on Challenger o-rings.

## Empirical Paper Questions

These questions are about the Lampe and Resnick once again. For this week, we'll focus on the logistic regression table in Table 4.

EQ0. Any questions or clarifications from the paper that we didn't cover last week?
EQ1. Be ready to explain what all of Table 5 means in both statistical and substantive terms. In particular, be ready to interpret all of the coefficients and to explain what the t-statistics, ${\displaystyle R^{2}}$, and p-values mean. (Note that this is not really different from EQ3 last week except that you should now be able to interpret the values jointly more effectively).
EQ2. Be ready to explain what Table 4 means in both statistical and substantive terms. In particular, be ready to interpret the coefficients in substantive terms and be ready to explain what the Z-statistics, Pseudo ${\displaystyle R^{2}}$, and p-values mean.

And these questions focus on issues raised by Reinhart in §8 and §9.

EQ3. What are unobserved (or at least unmeasured) confounding variables that might threaten the validity of the estimates in Lampe and Resnick's models reported in Tables 4 and 5?
EQ4. For either of the models reported by Lampe and Resnick, by prepared to explain what a causal interpretation of the results might look like. Be prepared to explain why such an interpretation is unjustified.
EQ5. Identify decisions made by Lampe and Resnick that indicate "researcher degrees of freedom" that may have shaped the results observed in the study. How do these issues impact your interpretation or confidence in the results of the study? What strategies might the authors have employed to overcome these concerns?