Editing Statistics and Statistical Programming (Fall 2020)/pset7




== Programming challenges (Part I) ==
=== Import and update the data ===
=== Summarize and visualize ===
=== Fit and summarize a linear model ===
=== Assess the model fit === 
=== Interpret the results === 
=== Calculate an out-of-sample prediction interval ===
@@ Line 1: / Line 1: @@
-This problem set asks you to apply, extend, and interpret the widely influential "bread and peace" model of U.S. electoral behavior from the work of [https://douglas-hibbs.com/ Douglas Hibbs]. In brief, Hibbs argues that two variables almost perfectly predict U.S. presidential election vote-share for incumbent party candidates since 1950: economic growth and U.S. military fatalities (both calculated over the duration of the previous president's term). Since we're doing univariate (one predictor variable) regression this week, I ask you to work with the income measure (predictor) and the incumbent part vote share (outcome).
-== Programming challenges ==
-=== PC1 Import and update data ===
-Data for all U.S. presidential elections 1952-2012 are [https://github.com/avehtari/ROS-Examples/raw/master/ElectionsEconomy/data/hibbs.dat available here]. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: <code>read.table(url(<insert.url.here>), header=TRUE)</code>. (inserting the URL for the dataset in the appropriate spot).
-Each row corresponds to one presidential election since 1952. The variables provided are:
-* <code>year</code> The year of the presidential election.
-* <code>growth</code> Economic growth during the preceding four years (increase in per-capita income).
-* <code>vote</code> Proportion of the popular vote won by the incumbent party candidate.
-* <code>inc_party_candidate</code> Incumbent party candidate.
-* <code>other_candidate</code> Out-party candidate.
-The dataset does not include 2016, so we can add that by hand. You might recall that Hillary Clinton was the incumbent party candidate and Donald Trump was the out-party candidate that year. Clinton won approximately 51.1% of the popular vote and a reasonable estimate for per-capita income growth 2012-2016 is 2.2%. You can append this information to the imported dataset in a bunch of different ways. (I would personally do so using a call to <code>list()</code> nested inside a call to <code>rbind()</code> (e.g., <code>rbind(<hibbs_data>, list(<2016 row>))</code>). You could also explore the <code>add_row()</code> function in the tidyverse. As usual, your mileage may vary.)
+== Programming challenges (Part I) ==
+=== Import and update the data ===
-=== PC2 Summarize and visualize data ===
+=== Summarize and visualize ===
+=== Fit and summarize a linear model ===
-You should be familiar with how to do this by now. Make sure to include a scatterplot of <code>growth</code> against <code>vote</code>.
+=== Assess the model fit ===
+=== Interpret the results ===
-=== PC3 Calculate covariance and correlation ===
+=== Calculate an out-of-sample prediction interval ===
-Calculate the covariance and correlation of <code>growth</code> and <code>vote</code>.
-See this week's R tutorial for example commands here and the Wikipedia articles on correlation and covariance for details about the underlying calculations.
-=== PC4 Fit and summarize a linear model ===
-Use the <code>lm()</code> function to fit a least squares regression of economic growth on incumbent party vote share. Use the <code>summary()</code> function to present a summary of the model results.
-=== PC5 Assess the model fit ===
-Evaluate the conditions for least squares regression (linearity, normal residuals, constant variability, independent observations). Wherever possible, present plots and/or calculations to support your evaluations. In particular, you probably want to produce the following (examples provided in this week's R tutorial):
-(a) a histogram of the residuals
-(b) a plot of the residuals against the (sequential) values of X
-(c) a quantile-quantile plot
-=== PC6 Calculate confidence interval for a coefficient ===
-The very last part of `OpenIntro` §8 provides detailed instructions for estimating a confidence interval around a regression coefficient. Please calculate the confidence interval for the coefficient on <code>growth</code> from the results of your regression model.
-=== PC7 Calculate an out-of-sample prediction and 95% prediction interval ===
-What was/is the predicted vote share for Donald Trump in 2020 based on this model? The online supplement to `OpenIntro` §8 assigned this week provides detailed examples for how to produce a out-of-sample prediction from a regression model. Please calculate the point estimate and 95% prediction interval for the incumbent party candidate's share of the vote in 2020 given that (a [https://osf.io/preprints/socarxiv/xrf3t/ reasonable estimate] of) the per-capita income growth 2016-2020 is 2.5%.
-== Statistical questions ==
-The questions below refer to the univariate regression analysis you completed in the programming challenges above.
-=== SQ1 Describe and interpret the results ===
-Do this for any/all of the analysis you conducted in the programming challenges. In particular, be sure to:
-* address any noteworthy observations from the descriptive summaries and plots
-* summarize the regression results effectively (including the coefficients and <math>R^2</math> value).
-* summarize the confidence interval around the estimate for <growth> that you calculated.
-* provide a substantive interpretation of the results in terms of the variables/concepts included in the analysis.
-=== SQ2 Discuss regression diagnostics ===
-Describe the regression diagnostics and whether the conditions necessary to identify a least-squares fit seem to apply. If there are violations of these assumptions/conditions, consider how that might bias the results.
-=== SQ3 Disambiguate: correlation vs. covariance vs. OLS estimate ===
-You characterized the relationship between <code>growth</code> and <code>vote</code> in three different ways. What do you make of each of these? What are the similarities and differences between them?
-=== SQ4 Interpret out-of-sample prediction ===
-Discuss and interpret the out-of-sample prediction you calculated for Trump's vote share in 2020. As of the writing of the problem set, Trump seems to have received about [https://en.wikipedia.org/w/index.php?title=2020_United_States_presidential_election&oldid=988030609 47.6% of the popular vote]. How does this (not-yet-final) observed value relate to your prediction? How do you interpret this relationship?
-=== SQ5 Revisit (vaguely stated) theory ===
-Insofar as we've only considered one part of the "bread and peace" theory here, how would you interpret your results in light of the prior theory/findings as described at the beginning of the problem set? Any confounding factors not present in the original theory/models that you think might be important to include? Why would you argue to include them (or not)?