Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 8
From CommunityData
The first set of programming challenges will use your the individual dataset we used in the week 3 problem set's programming challenges:
- PC0. Load up your dataset as you did in Week 3 PC2.
- PC1. If you recall from Week PC6, x and y seemed like they linearly related. We now have the tools and terminology to describe this relationship and to estimate just how related they are. Run a t.test between x and y in the dataset and be ready to interpret the results for the class.
- PC2. Estimate how correlated x and y are with each other?
- PC3. Recode your data in the way that I laid out in Week 3 PC7.
- PC4. Generate a set of three linear models and be ready to intrepret the coefficients, standard errors, t-statistics, p-values, and for each:
- (a)
- (b)
- (c)
- PC5. Generate a set of residual plots for the final model (c) and be ready to interpret your model in terms of each of these:
- (a) A histogram of the residuals.
- (b) Plot the residuals by your values of x, i, j, and k (four different plots).
- (c) A QQ plot to evaluate the normality of residuals assumption.
- PC6. Generate a nice looking publication-ready table with a series of fitted models and put them in your table.
Now, lets go back to the Michelle Obama dataset we used last week the week 7 problem set's programming challenges.
- PC7. Load up the dataset once again and fit the following linear models and be ready to interpret them similar to the way you did above in PC4:
- (a)
- (b) Add a control for age and a categorical version of a control for year to the model in (a).
- (c) Take a look at the residuals and try to interpret these as you would in PC4 above. What do you notice?
- (d) Run the simple model in (a) three times on three subsets of the dataset: just 2012, 2014, and 2015. Be ready to talk through the results.
- PC8.