Statistics and Statistical Programming (Fall 2020)/pset7

From CommunityData

This problem set asks you to apply, extend, and interpret the widely influential "bread and peace" model of U.S. electoral behavior from the work of Douglas Hibbs. In brief, Hibbs argues that two variables almost perfectly predict U.S. presidential election vote-share for incumbent party candidates since 1950: economic growth and U.S. military fatalities (both calculated over the duration of the previous president's term). Since doing univariate (one predictor variable) regression this week, I ask you to work with the income measure (predictor) and the incumbent part vote share (outcome).

Programming challenges

Import and update data

Data for all U.S. presidential elections 1952-2012 are available here. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: read.table(url(<insert.url.here>), header=TRUE).

Each row corresponds to one presidential election since 1952. The variables provided are:

  • year The year of the presidential election.
  • growth Economic growth during the preceding four years (increase in per-capita income).
  • vote Proportion of the popular vote won by the incumbent party candidate.
  • inc_party_candidate Incumbent party candidate.
  • other_candidate Out-party candidate.

The dataset does not include 2016, so we can add that by hand. You might recall that Hillary Clinton was the incumbent party candidate and Donald Trump was the out-party candidate that year. Clinton won approximately 51.1% of the popular vote and a reasonable estimate for per-capita income growth 2012-2016 is 2.2%. You can append this information to the imported dataset in a bunch of different ways. (I would personally do so using a call to list() nested inside a call to rbind() (e.g., rbind(<hibbs_data>, list(<2016 row>))). You could also explore the add_row() function in the tidyverse. As usual, your mileage may vary.)

Summarize and visualize data

Calculate covariance and correlation

Fit and summarize a linear model

Assess the model fit

Calculate confidence interval for a coefficient

Calculate an out-of-sample prediction interval

Statistical questions

Interpret the results

Disambiguate: correlation vs. covariance vs. OLS estimate

Identify threats to validity of estimates

Interpret prediction

Brainstorm confounds & alternative explanations

Revisit (vaguely stated) theory