Editing Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 5

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 5: Line 5:
: '''PC0.''' The dataset is available as a TSV file in the directory <code>week_05</code> in the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course]. Note that a TSV file is ''tab delimited'', not comma delimited (it is otherwise similar to a CSV file). Go ahead and inspect the data and load it into R (''Hint:'' You'll want to use the <code>read.delim()</code> function).  
: '''PC0.''' The dataset is available as a TSV file in the directory <code>week_05</code> in the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course]. Note that a TSV file is ''tab delimited'', not comma delimited (it is otherwise similar to a CSV file). Go ahead and inspect the data and load it into R (''Hint:'' You'll want to use the <code>read.delim()</code> function).  
: '''PC1.''' Calculate the mean of the variable <code>x</code> in the full dataset. Go back to your Week 3 problem set and revisit the mean you calculated for <code>x</code>. Be prepared to discuss the ''conceptual'' relationship of these two means to each other.  
: '''PC1.''' Calculate the mean of the variable <code>x</code> in the full dataset. Go back to your Week 3 problem set and revisit the mean you calculated for <code>x</code>. Be prepared to discuss the ''conceptual'' relationship of these two means to each other.  
: '''PC2.''' Again, using the variable <code>x</code> from your Week 3 data, compute the 95% confidence interval for the mean of this vector "by hand" (in R) using the normal formula for standard error <math>(\frac{\sigma}{\sqrt{n}})</math>. (''Bonus:'' Do this by writing a function.)
: '''PC2.''' Again, using the variable <code>x</code> from your Week 3 data, compute the 95% confidence interval for the mean of this vector in two ways:
<!---
:* (a) By "hand" (in R is fine) using the normal formula for standard error <math>(\frac{\sigma}{\sqrt{n}})</math>. (''Bonus challenge:'' Complete this by writing a function that calculates a confidence interval for the mean of any numeric vector.)
:* (b) Using an appropriate built-in R function (see this week's R lecture materials for a relevant example).
:* (b) Using an appropriate built-in R function (see this week's R lecture materials for a relevant example).
:* (c) Bonus: The results from (a) and (b) should be the same or very close. After reading ''OpenIntro'' §5, can you explain why they might not be exactly the same?
:* (c) Bonus: The results from (a) and (b) should be the same or very close. After reading ''OpenIntro'' §5, can you explain why they might not be exactly the same?
--->
: '''PC3.''' Compare the mean of <code>x</code> from your Week 3 sample — and your confidence interval — to the population mean (the version of <code>x</code> in the Week 5 dataset). Is the true mean inside your confidence interval? Should you find this surprising? Why or why not? Be prepared to discuss the relationship of these values to each other.
: '''PC3.''' Compare the mean of <code>x</code> from your Week 3 sample — and your confidence interval — to the population mean (the version of <code>x</code> in the Week 5 dataset). Is the true mean inside your confidence interval? Should you find this surprising? Why or why not? Be prepared to discuss the relationship of these values to each other.
: '''PC4.''' Let's look beyond the mean. Compare the distribution from your sample of <code>x</code> to the true population of <code>x</code>. Draw histograms and compute other descriptive and summary statistics. What do you notice? Be prepared to discuss and explain any differences.
: '''PC4.''' Let's look beyond the mean. Compare the distribution from your sample of <code>x</code> to the true population of <code>x</code>. Draw histograms and compute other descriptive and summary statistics. What do you notice? Be prepared to discuss and explain any differences.
: '''PC5.''' Calculate the conditional mean of <code>x</code> for each of the groups in the population and the standard deviation of this distribution of conditional means. Compare this standard deviation to the standard error of the mean you calculated in PC2 above. Explain the relationship between these values.
: '''PC5.''' Calculate the conditional mean of <code>x</code> for each of the groups in the population and the standard deviation of this distribution of conditional means. Compare this standard deviation to the answers you calculated in PC2 part (a) above. Explain the relationship between these values.
: '''PC6.''' I want you to run a simple simulation that demonstrates a fundamental insight of statistics. Please see the R lecture materials from Week 4 for ideas about how to do this (but note that there are some differences between that example and this programming challenge).
: '''PC6.''' I want you to run a simple simulation that demonstrates a fundamental insight of statistics. Please see the R lecture materials from last week for ideas about how to do this (but note that there are some differences between that example and this programming challenge).
:* (a) Create a vector of 10,000 randomly generated numbers that are uniformly distributed between 0 and 9.
:* (a) Create a vector of 10,000 randomly generated numbers that are uniformly distributed between 0 and 9.
:* (b) Calculate the mean of that vector. Draw a histogram of the distribution.
:* (b) Calculate the mean of that vector. Draw a histogram of the distribution.
Line 24: Line 23:


: '''SQ0.''' Any questions or clarifications from the OpenIntro text or lecture notes?
: '''SQ0.''' Any questions or clarifications from the OpenIntro text or lecture notes?
: '''SQ1.''' Exercise 5.22 which is about student test scores.
: '''SQ1.''' Exercise 5.16 which is a set of True/False questions
: '''SQ2.''' Exercise 5.28 which is about Diamonds  
: '''SQ2.''' Exercise 5.28 which is about Diamonds  
: '''SQ3.''' Exercise 5.30 which is also about Diamonds
: '''SQ3.''' Exercise 5.30 which is also about Diamonds
: '''SQ4.''' Exercise 5.48 which is about work hours and education
: '''SQ4.''' Exercise 5.48 which is about work hours and education
: '''SQ5.''' Exercise 5.52 which is a set of True/False questions about ANOVA
: '''SQ5.''' Exercise 5.52 which is another set of True/False questions about ANOVA


'''Reinhart §1'''
'''Reinhart §1'''
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)