Editing Statistics and Statistical Programming (Fall 2020)/pset7
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 2: | Line 2: | ||
== Programming challenges == | == Programming challenges == | ||
=== | === Import and update data === | ||
Data for all U.S. presidential elections 1952-2012 are [https://github.com/avehtari/ROS-Examples/raw/master/ElectionsEconomy/data/hibbs.dat available here]. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: <code>read.table(url(<insert.url.here>), header=TRUE)</code>. (inserting the URL for the dataset in the appropriate spot). | Data for all U.S. presidential elections 1952-2012 are [https://github.com/avehtari/ROS-Examples/raw/master/ElectionsEconomy/data/hibbs.dat available here]. Note that this points to a ".dat" file, which in this case is just a raw text file format that you can import using the following command: <code>read.table(url(<insert.url.here>), header=TRUE)</code>. (inserting the URL for the dataset in the appropriate spot). | ||
Line 14: | Line 14: | ||
The dataset does not include 2016, so we can add that by hand. You might recall that Hillary Clinton was the incumbent party candidate and Donald Trump was the out-party candidate that year. Clinton won approximately 51.1% of the popular vote and a reasonable estimate for per-capita income growth 2012-2016 is 2.2%. You can append this information to the imported dataset in a bunch of different ways. (I would personally do so using a call to <code>list()</code> nested inside a call to <code>rbind()</code> (e.g., <code>rbind(<hibbs_data>, list(<2016 row>))</code>). You could also explore the <code>add_row()</code> function in the tidyverse. As usual, your mileage may vary.) | The dataset does not include 2016, so we can add that by hand. You might recall that Hillary Clinton was the incumbent party candidate and Donald Trump was the out-party candidate that year. Clinton won approximately 51.1% of the popular vote and a reasonable estimate for per-capita income growth 2012-2016 is 2.2%. You can append this information to the imported dataset in a bunch of different ways. (I would personally do so using a call to <code>list()</code> nested inside a call to <code>rbind()</code> (e.g., <code>rbind(<hibbs_data>, list(<2016 row>))</code>). You could also explore the <code>add_row()</code> function in the tidyverse. As usual, your mileage may vary.) | ||
=== | === Summarize and visualize data === | ||
You should be familiar with how to do this by now. Make sure to include a scatterplot of <code>growth</code> against <code>vote</code>. | You should be familiar with how to do this by now. Make sure to include a scatterplot of <code>growth</code> against <code>vote</code>. | ||
=== | === Calculate covariance and correlation === | ||
Calculate the covariance and correlation of <code>growth</code> and <code>vote</code>. | Calculate the covariance and correlation of <code>growth</code> and <code>vote</code>. | ||
See this week's R tutorial for example commands here and the Wikipedia articles on correlation and covariance for details about the underlying calculations. | See this week's R tutorial for example commands here and the Wikipedia articles on correlation and covariance for details about the underlying calculations. | ||
=== | === Fit and summarize a linear model === | ||
Use the <code>lm()</code> function to fit a least squares regression of economic growth on incumbent party vote share. Use the <code>summary()</code> function to present a summary of the model results. | Use the <code>lm()</code> function to fit a least squares regression of economic growth on incumbent party vote share. Use the <code>summary()</code> function to present a summary of the model results. | ||
=== | === Assess the model fit === | ||
Evaluate the conditions for least squares regression (linearity, normal residuals, constant variability, independent observations). Wherever possible, present plots and/or calculations to support your evaluations. In particular, you probably want to produce the following (examples provided in this week's R tutorial): | Evaluate the conditions for least squares regression (linearity, normal residuals, constant variability, independent observations). Wherever possible, present plots and/or calculations to support your evaluations. In particular, you probably want to produce the following (examples provided in this week's R tutorial): | ||
Line 34: | Line 34: | ||
(c) a quantile-quantile plot | (c) a quantile-quantile plot | ||
=== | === Calculate confidence interval for a coefficient === | ||
The very last part of `OpenIntro` §8 provides detailed instructions for estimating a confidence interval around a regression coefficient. Please calculate the confidence interval for the coefficient on <code>growth</code> from the results of your regression model. | The very last part of `OpenIntro` §8 provides detailed instructions for estimating a confidence interval around a regression coefficient. Please calculate the confidence interval for the coefficient on <code>growth</code> from the results of your regression model. | ||
=== | === Calculate an out-of-sample prediction and 95% prediction interval === | ||
What was/is the predicted vote share for Donald Trump in 2020 based on this model? The online supplement to `OpenIntro` §8 assigned this week provides detailed examples for how to produce a out-of-sample prediction from a regression model. Please calculate the point estimate and 95% prediction interval for the incumbent party candidate's share of the vote in 2020 given that (a [https://osf.io/preprints/socarxiv/xrf3t/ reasonable estimate] of) the per-capita income growth 2016-2020 is 2.5%. | What was/is the predicted vote share for Donald Trump in 2020 based on this model? The online supplement to `OpenIntro` §8 assigned this week provides detailed examples for how to produce a out-of-sample prediction from a regression model. Please calculate the point estimate and 95% prediction interval for the incumbent party candidate's share of the vote in 2020 given that (a [https://osf.io/preprints/socarxiv/xrf3t/ reasonable estimate] of) the per-capita income growth 2016-2020 is 2.5%. | ||
Line 45: | Line 45: | ||
The questions below refer to the univariate regression analysis you completed in the programming challenges above. | The questions below refer to the univariate regression analysis you completed in the programming challenges above. | ||
=== | === Describe and interpret the results === | ||
Do this for any/all of the analysis you conducted in the programming challenges. In particular, be sure to: | Do this for any/all of the analysis you conducted in the programming challenges. In particular, be sure to: | ||
* address any noteworthy observations from the descriptive summaries and plots | * address any noteworthy observations from the descriptive summaries and plots | ||
Line 52: | Line 52: | ||
* provide a substantive interpretation of the results in terms of the variables/concepts included in the analysis. | * provide a substantive interpretation of the results in terms of the variables/concepts included in the analysis. | ||
=== | === Discuss regression diagnostics === | ||
Describe the regression diagnostics and whether the conditions necessary to identify a least-squares fit seem to apply. If there are violations of these assumptions/conditions, consider how that might bias the results. | Describe the regression diagnostics and whether the conditions necessary to identify a least-squares fit seem to apply. If there are violations of these assumptions/conditions, consider how that might bias the results. | ||
=== | === Disambiguate: correlation vs. covariance vs. OLS estimate === | ||
You characterized the relationship between <code>growth</code> and <code>vote</code> in three different ways. What do you make of each of these? What are the similarities and differences between them? | You characterized the relationship between <code>growth</code> and <code>vote</code> in three different ways. What do you make of each of these? What are the similarities and differences between them? | ||
=== | === Interpret out-of-sample prediction === | ||
Discuss and interpret the out-of-sample prediction you calculated for Trump's vote share in 2020. As of the writing of the problem set, Trump seems to have received about [https://en.wikipedia.org/w/index.php?title=2020_United_States_presidential_election&oldid=988030609 47.6% of the popular vote]. How does this (not-yet-final) observed value relate to your prediction? How do you interpret this relationship? | Discuss and interpret the out-of-sample prediction you calculated for Trump's vote share in 2020. As of the writing of the problem set, Trump seems to have received about [https://en.wikipedia.org/w/index.php?title=2020_United_States_presidential_election&oldid=988030609 47.6% of the popular vote]. How does this (not-yet-final) observed value relate to your prediction? How do you interpret this relationship? | ||
=== | === Revisit (vaguely stated) theory === | ||
Insofar as we've only considered one part of the "bread and peace" theory here, how would you interpret your results in light of the prior theory/findings as described at the beginning of the problem set? Any confounding factors not present in the original theory/models that you think might be important to include | Insofar as we've only considered one part of the "bread and peace" theory here, how would you interpret your results in light of the prior theory/findings as described at the beginning of the problem set? Any confounding factors not present in the original theory/models that you think might be important to include? |