Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Page
Discussion
Edit
View history
Editing
Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 5
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Programming Challenges == : '''PC0.''' I've provided the full dataset from which I drew each of your samples in a TSV file in the directory <code>week_05</code> in [https://github.com/makoshark/uwcom521-assignments/ class assignment git repository]. These are ''tab delimited'', not comma delimited. TSV, is related to CSV and is also a common format. Go ahead and load it into R (''HINT: <code>read.delim()</code>''). Take the mean of the variable <code>x</code> in that dataset. That is the true population mean β the thing we have been creating estimates of in week 2 and week 3. : '''PC1.''' Go back to the dataset I distributed for [[Statistics and Statistical Programming (Winter 2017)/Problem Set: Week 3|the week 3 problem set]]. You've already computed the mean for this in week 2. You should compute the 95% confidence interval for the variable <code>x</code> in two ways: :* (a) By hand using the normal formula for standard error <math>(\frac{\sigma}{\sqrt{n}})</math>. :* (b) Using the appropriate built-in R function. These number should be the same or very close. After reading the OpenIntro, can you explain why they might not be exactly the same? :* (c) Compare the mean from your sample β and your confidence interval β to the true population mean. Is the true mean inside your confidence interval? : '''PC2.''' Let's look beyond the mean. Compare the distribution from your sample of <code>x</code> to the true population. Draw histograms and compute other descriptive and summary statistics. What do you notice? Be ready to talk for a minute or two about the differences. : '''PC3.''' Compute the mean of <code>y</code> from the true population and then create the mean and confidence interval from the <code>y</code> in your sample. Is it in or out? : '''PC4.''' I want you to run a simple simulation that demonstrates one of the most fundamental insights of statistics: :* (a) Create a vector of 10,000 randomly generated numbers that are uniformly distributed between 0 and 9. :* (b) Take the mean of that vector. Draw a histogram. :* (c) Create 100 random samples of 2 items each from your randomly generated data and take the mean of each sample. Create a new vector that contains those means. Describe/display the distribution of those means. :* (d) Do (c) except make the items 10 items in each sample instead of 2. Then do (c) again except with 100 items. Be ready to describe how the histogram changes as the sample size increases. (''HINT: You'll make me very happy if you write a function to do this.'') : '''PC5.''' Do PC4 again but with random data drawn from a normal distribution (<math>N(\mu=42, \sigma=42)</math>) instead of a uniform distribution. How are you results different than in PC4?
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information