Statistics and Statistical Programming (Fall 2020)/pset7: Difference between revisions

From CommunityData
No edit summary
No edit summary
Line 1: Line 1:
This problem set asks you to apply, extend, and interpret the widely influential "bread and peace" model of U.S. electoral behavior from the work of [https://douglas-hibbs.com/ Douglas Hibbs]. In brief, Hibbs argues that two variables almost perfectly predict U.S. presidential election vote-share for incumbent party candidates since 1950: economic growth and U.S. military fatalities (both calculated over the duration of the previous president's term). Since doing univariate (one predictor variable) regression this week, I ask you to work with the income measure (predictor) and the incumbent part vote share (outcome).   
This problem set asks you to apply, extend, and interpret the widely influential "bread and peace" model of U.S. electoral behavior from the work of [https://douglas-hibbs.com/ Douglas Hibbs]. In brief, Hibbs argues that two variables almost perfectly predict U.S. presidential election vote-share for incumbent party candidates since 1950: economic growth and U.S. military fatalities (both calculated over the duration of the previous president's term). Since we're doing univariate (one predictor variable) regression this week, I ask you to work with the income measure (predictor) and the incumbent part vote share (outcome).   
== Programming challenges ==
== Programming challenges ==
=== Import and update data ===
=== Import and update data ===
Data for all U.S. presidential elections 1952-2012 are [https://github.com/avehtari/ROS-Examples/raw/master/ElectionsEconomy/data/hibbs.dat available here]. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: <code>read.table(url(<insert.url.here>), header=TRUE)</code>.
Data for all U.S. presidential elections 1952-2012 are [https://github.com/avehtari/ROS-Examples/raw/master/ElectionsEconomy/data/hibbs.dat available here]. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: <code>read.table(url(<insert.url.here>), header=TRUE)</code>. (inserting the URL for the dataset in the appropriate spot).


Each row corresponds to one presidential election since 1952. The variables provided are:
Each row corresponds to one presidential election since 1952. The variables provided are:
Line 14: Line 14:


=== Summarize and visualize data ===
=== Summarize and visualize data ===
You should be familiar with how to do this by now. Make sure to include a scatterplot of <code>growth</code> against <code>vote</code>.
=== Calculate covariance and correlation ===
=== Calculate covariance and correlation ===
Calculate the covariance and correlation of <code>growth</code> and <code>vote</code>.
See this week's R tutorial for example commands here and the Wikipedia articles on correlation and covariance for details about the underlying calculations.
=== Fit and summarize a linear model ===
=== Fit and summarize a linear model ===
Use the <code>lm()</code> function to fit a least squares regression of economic growth on incumbent party vote share. Use the <code>summary()</code> function to present a summary of the model results.
=== Assess the model fit ===  
=== Assess the model fit ===  
Evaluate the conditions for least squares regression (linearity, normal residuals, constant variability, independent observations). Wherever possible, present plots and/or calculations to support your evaluations. In particular, you probably want to produce the following (examples provided in this week's R tutorial):
(a) a histogram of the residuals
(b) a plot of the residuals against the (sequential) values of X
(c) a quantile-quantile plot
=== Calculate confidence interval for a coefficient ===  
=== Calculate confidence interval for a coefficient ===  
=== Calculate an out-of-sample prediction interval ===
 
The very last part of `OpenIntro` §8 provides detailed instructions for estimating a confidence interval around a regression coefficient. Please calculate the confidence interval for the coefficient on <code>growth</code> from the results of your regression model.
 
=== Calculate an out-of-sample prediction and 95% prediction interval ===
 
What was/is the predicted vote share for Donal Trump in 2020 based on this model? The online supplement to `OpenIntro` §8 assigned this week provides detailed examples for how to produce a out-of-sample prediction from a regression model. Please calculate the point estimate and 95% prediction interval for the incumbent party candidate's share of the vote in 2020 given that (a [https://osf.io/preprints/socarxiv/xrf3t/ reasonable estimate] of) the per-capita income growth 2016-2020 is 2.5%.
 
== Statistical questions ==
== Statistical questions ==
The questions below refer to the univariate regression analysis you completed in the programming challenges above.
=== Interpret the results ===  
=== Interpret the results ===  
=== Disambiguate: correlation vs. covariance vs. OLS estimate ===
=== Disambiguate: correlation vs. covariance vs. OLS estimate ===

Revision as of 19:33, 10 November 2020

This problem set asks you to apply, extend, and interpret the widely influential "bread and peace" model of U.S. electoral behavior from the work of Douglas Hibbs. In brief, Hibbs argues that two variables almost perfectly predict U.S. presidential election vote-share for incumbent party candidates since 1950: economic growth and U.S. military fatalities (both calculated over the duration of the previous president's term). Since we're doing univariate (one predictor variable) regression this week, I ask you to work with the income measure (predictor) and the incumbent part vote share (outcome).

Programming challenges

Import and update data

Data for all U.S. presidential elections 1952-2012 are available here. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: read.table(url(<insert.url.here>), header=TRUE). (inserting the URL for the dataset in the appropriate spot).

Each row corresponds to one presidential election since 1952. The variables provided are:

  • year The year of the presidential election.
  • growth Economic growth during the preceding four years (increase in per-capita income).
  • vote Proportion of the popular vote won by the incumbent party candidate.
  • inc_party_candidate Incumbent party candidate.
  • other_candidate Out-party candidate.

The dataset does not include 2016, so we can add that by hand. You might recall that Hillary Clinton was the incumbent party candidate and Donald Trump was the out-party candidate that year. Clinton won approximately 51.1% of the popular vote and a reasonable estimate for per-capita income growth 2012-2016 is 2.2%. You can append this information to the imported dataset in a bunch of different ways. (I would personally do so using a call to list() nested inside a call to rbind() (e.g., rbind(<hibbs_data>, list(<2016 row>))). You could also explore the add_row() function in the tidyverse. As usual, your mileage may vary.)

Summarize and visualize data

You should be familiar with how to do this by now. Make sure to include a scatterplot of growth against vote.

Calculate covariance and correlation

Calculate the covariance and correlation of growth and vote.

See this week's R tutorial for example commands here and the Wikipedia articles on correlation and covariance for details about the underlying calculations.

Fit and summarize a linear model

Use the lm() function to fit a least squares regression of economic growth on incumbent party vote share. Use the summary() function to present a summary of the model results.

Assess the model fit

Evaluate the conditions for least squares regression (linearity, normal residuals, constant variability, independent observations). Wherever possible, present plots and/or calculations to support your evaluations. In particular, you probably want to produce the following (examples provided in this week's R tutorial): (a) a histogram of the residuals (b) a plot of the residuals against the (sequential) values of X (c) a quantile-quantile plot

Calculate confidence interval for a coefficient

The very last part of `OpenIntro` §8 provides detailed instructions for estimating a confidence interval around a regression coefficient. Please calculate the confidence interval for the coefficient on growth from the results of your regression model.

Calculate an out-of-sample prediction and 95% prediction interval

What was/is the predicted vote share for Donal Trump in 2020 based on this model? The online supplement to `OpenIntro` §8 assigned this week provides detailed examples for how to produce a out-of-sample prediction from a regression model. Please calculate the point estimate and 95% prediction interval for the incumbent party candidate's share of the vote in 2020 given that (a reasonable estimate of) the per-capita income growth 2016-2020 is 2.5%.

Statistical questions

The questions below refer to the univariate regression analysis you completed in the programming challenges above.

Interpret the results

Disambiguate: correlation vs. covariance vs. OLS estimate

Identify threats to validity of estimates

Interpret prediction

Brainstorm confounds & alternative explanations

Revisit (vaguely stated) theory