Editing Statistics and Statistical Programming (Fall 2020)/pset2

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
<small>[[Statistics_and_Statistical_Programming_(Fall_2020)#Week_4_.2810.2F6.2C_10.2F8.29|← Back to Week 4]]</small>
<div class="noautonum">__TOC__</div>
<div class="noautonum">__TOC__</div>
<small>[[Statistics_and_Statistical_Programming_(Fall_2020)#Week_4_.2810.2F6.2C_10.2F8.29|← Back to Week 4]]</small>


For this problem set, the programming challenges focus on some of the more advanced fundamentals of R, including some of the new types of data import, transformation, tidying, and visualization introduced in the most recent R tutorial. These are followed by some questions about an empirical paper that focus on applying some of the concepts from the first few chapters of ''OpenIntro'' to a research context that may be familiar.
For this problem set, the programming challenges focus on some of the more advanced fundamentals of R, including some of the new types of data import, transformation, tidying, and visualization introduced in the most recent R tutorial. These are followed by some questions about an empirical paper that focus on applying some of the concepts from the first few chapters of ''OpenIntro'' to a research context that may be familiar.
Line 33: Line 33:


===PC5. Cleanup/tidy your data===
===PC5. Cleanup/tidy your data===
Once again, some cleanup and recoding is needed for this week's data. It turns out that the variables <code>i</code> and <code>j</code> are really dichotomous "true/false" variables that have been coded as 0 and 1 respectively in this dataset. Recode these columns as <code>logical</code> (i.e., "TRUE" or "FALSE" values). The variable <code>k</code> is really a categorical variable. Recode <code>k</code> as a factor and change the numbers so that they are replaced with the following values or levels: 0="none", 1="some", 2="lots", 3="all". *Your data file may only contains the values 1,2,3. The goal is to end up with a factor (so the command <code>class(k)</code> should return the value <code>TRUE</code>) where those text strings are the levels of the factor.
Once again, some cleanup and recoding is needed for this week's data. It turns out that the variables <code>i</code> and <code>j</code> are really dichotomous "true/false" variables that have been coded as 0 and 1 respectively in this dataset. Recode these columns as <code>logical</code> (i.e., "TRUE" or "FALSE" values). The variable <code>k</code> is really a categorical variable. Recode <code>k</code> as a factor and change the numbers so that they are replaced with the following values or levels: 0="none", 1="some", 2="lots", 3="all". The goal is to end up with a factor (so the command <code>class(k)</code> should return the value <code>TRUE</code>) where those text strings are the levels of the factor.


===PC6. Calculate conditional summary statistics===
===PC6. Calculate conditional summary statistics===
Line 46: Line 46:
== Statistical Questions ==
== Statistical Questions ==


===SQ1. Interpret bivariate analyses===
===SQ1===


Return to the dataset you imported and worked with in the programming challenges above. Imagine that it comes from a year-long study of bicyclists using a combination of survey and ride-tracking data from the Divvy bikeshare members in the Chicagoland area conducted a few years ago (let's say 2018, just to pick a year). Each row in the data corresponds to a single Divvy cyclist/member and the variables correspond to the following measures:  
Return to the dataset you imported and worked with in the programming challenges above. Imagine that it comes from a year-long study of bicyclists using a combination of survey and ride-tracking data from the Divvy bikeshare members in the Chicagoland area conducted a few years ago (let's say 2018, just to pick a year). Each row in the data corresponds to a single Divvy cyclist/member and the variables correspond to the following measures:  
Line 59: Line 59:
# Return to the scatterplot you created in PC8 above. Given the information you now have about the study, how would you interpret it? Does there seem to be any sort of relationship between the two variables?
# Return to the scatterplot you created in PC8 above. Given the information you now have about the study, how would you interpret it? Does there seem to be any sort of relationship between the two variables?


===SQ2. Birthdays revisited (Optional bonus!)===
===Optional bonus SQ3===
 
'''Optional bonus statistical question'''


''We talked about birthdays in the context of one of the textbook exercises for ''OpenIntro'' Chapter 3. Here's an opportunity to apply your knowledge and extend that exercise. Note that you can absolutely use R to help calculate the solutions to both parts of this problem. That said, it's a super famous problem and answers/examples are all over the internet, so if you want to challenge yourself, don't look at them while you're working on it! The only hint I'll give you is that you may find [https://en.wikipedia.org/wiki/Binomial_coefficient binomial coefficients] useful and the <code>choose()</code>) function can calculate them for you in R.''
''We talked about birthdays in the context of one of the textbook exercises for ''OpenIntro'' Chapter 3. Here's an opportunity to apply your knowledge and extend that exercise. Note that you can absolutely use R to help calculate the solutions to both parts of this problem. That said, it's a super famous problem and answers/examples are all over the internet, so if you want to challenge yourself, don't look at them while you're working on it! The only hint I'll give you is that you may find [https://en.wikipedia.org/wiki/Binomial_coefficient binomial coefficients] useful and the <code>choose()</code>) function can calculate them for you in R.''
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)