https://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&feed=atom&action=historyStatistics and Statistical Programming (Spring 2019)/Problem Set: Week 3 - Revision history2024-03-29T15:30:31ZRevision history for this page on the wikiMediaWiki 1.38.4https://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=135330&oldid=prevJdfoote at 17:20, 15 April 20192019-04-15T17:20:01Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 18:20, 15 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l5">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC0.''' Create a new project and RMarkdown script for this week's problem set (as usual).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC0.''' Create a new project and RMarkdown script for this week's problem set (as usual).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.''' Revisit your code from last week and recall what group number you were in (should be an integer between 1-20). Navigate to the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course] and download the .csv file in the <code>week_03</code> subdirectory with your group number from PC1 last week associated with it (e.g., <code>group_<output>.csv</code>). Note that it is a .csv file and not an .RData file. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.''' Revisit your code from last week and recall what group number you were in (should be an integer between 1-20). Navigate to the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course] and download the .csv file in the <code>week_03</code> subdirectory with your group number from PC1 last week associated with it (e.g., <code>group_<output>.csv</code>). Note that it is a .csv file and not an .RData file. </div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>::'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this <del style="font-weight: bold; text-decoration: none;">is </del>class, but I do recommend it.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>::'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this <ins style="font-weight: bold; text-decoration: none;">in </ins>class, but I do recommend it.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like.</div></td></tr>
</table>Jdfootehttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134258&oldid=prevAaronshaw: /* Programming Challenges */2019-04-04T21:32:34Z<p><span dir="auto"><span class="autocomment">Programming Challenges</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:32, 4 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8">Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>:'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named <del style="font-weight: bold; text-decoration: none;">"</del>x<del style="font-weight: bold; text-decoration: none;">" </del>in your dataset. Write your own function to recalculate the median of <del style="font-weight: bold; text-decoration: none;">"</del>x<del style="font-weight: bold; text-decoration: none;">"</del>. Be ready to walk us through how your function works!</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>:'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named <ins style="font-weight: bold; text-decoration: none;"><code></ins>x<ins style="font-weight: bold; text-decoration: none;"></code> </ins>in your dataset. Write your own function to recalculate the median of <ins style="font-weight: bold; text-decoration: none;"><code></ins>x<ins style="font-weight: bold; text-decoration: none;"></code></ins>. Be ready to walk us through how your function works!</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC6.''' Compare the vector from Week 2 with the first column (<code>x</code>) of the Week 3 data frame. They should be similar, but how similar? Write R code to demonstrate or support your answer.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC6.''' Compare the vector from Week 2 with the first column (<code>x</code>) of the Week 3 data frame. They should be similar, but how similar? Write R code to demonstrate or support your answer.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>:'''PC7.''' Visualize the Week 3 data using <code>ggplot2</code> and the <code>geom_point()</code> function to produce a scatterplot. First, plot <del style="font-weight: bold; text-decoration: none;">the </del><code>x</code> on the x-axis and <code>y</code> on the y-axis. Second, visualize <del style="font-weight: bold; text-decoration: none;">i, j, and k </del>on other dimensions (e.g., color, shape, and size seem reasonable). If you run into any issues plotting these dimensions, consider that <code>ggplot2</code> can be very picky about the classes of objects...</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>:'''PC7.''' Visualize the Week 3 data using <code>ggplot2</code> and the <code>geom_point()</code> function to produce a scatterplot. First, plot <code>x</code> on the x-axis and <code>y</code> on the y-axis. Second, visualize <ins style="font-weight: bold; text-decoration: none;">the other variables </ins>on other dimensions (e.g., color, shape, and size seem reasonable). If you run into any issues plotting these dimensions, consider that <code>ggplot2</code> can be very picky about the classes of objects...</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC8.''' A very common step when you import and prepare for data analysis is going to be cleaning and recoding data. Some of that is needed here. It turns out that the variables <code>i</code> and <code>j</code> are really dichotomous "true/false" variables that have been coded as 0 and 1 in this dataset. Recode these columns as <code>logical</code> (i.e., "TRUE" or "FALSE" values). The variable <code>k</code> is really a categorical variable. Recode this as a factor and change the numbers into the following levels: 0="none", 1="some", 2="lots", 3="all". The goal is to end up with a factor where those text strings are the levels of the factor.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC8.''' A very common step when you import and prepare for data analysis is going to be cleaning and recoding data. Some of that is needed here. It turns out that the variables <code>i</code> and <code>j</code> are really dichotomous "true/false" variables that have been coded as 0 and 1 in this dataset. Recode these columns as <code>logical</code> (i.e., "TRUE" or "FALSE" values). The variable <code>k</code> is really a categorical variable. Recode this as a factor and change the numbers into the following levels: 0="none", 1="some", 2="lots", 3="all". The goal is to end up with a factor where those text strings are the levels of the factor.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC9.''' Now that you have cleaned and recoded your data, summarize those three variables again. Also, go back and regenerate the visualizations from PC7. How have the plots changed (if at all)?</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC9.''' Now that you have cleaned and recoded your data, summarize those three variables again. Also, go back and regenerate the visualizations from PC7. How have the plots changed (if at all)?</div></td></tr>
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134255&oldid=prevAaronshaw: /* Statistical Questions (from OpenIntro) */2019-04-04T21:25:55Z<p><span dir="auto"><span class="autocomment">Statistical Questions (from OpenIntro)</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 22:25, 4 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l23">Line 23:</td>
<td colspan="2" class="diff-lineno">Line 23:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q2.''' Exercise 3.6 which is basically a continuation of 3.4</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q2.''' Exercise 3.6 which is basically a continuation of 3.4</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q3.''' Exercise 3.18 on evaluating normal approximation</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q3.''' Exercise 3.18 on evaluating normal approximation</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>: '''Q4.''' Exercise 3.32 on arachnophobia (spiders are frequent concern in statistical programming)</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>: '''Q4.''' Exercise 3.32 on arachnophobia (spiders are <ins style="font-weight: bold; text-decoration: none;">a </ins>frequent concern in statistical programming)</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Empirical Paper Questions ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Empirical Paper Questions ==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There is no empirical paper this week.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>There is no empirical paper this week.</div></td></tr>
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134093&oldid=prevAaronshaw: /* Empirical Paper Questions */2019-04-03T19:05:54Z<p><span dir="auto"><span class="autocomment">Empirical Paper Questions</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:05, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l27">Line 27:</td>
<td colspan="2" class="diff-lineno">Line 27:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Empirical Paper Questions ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Empirical Paper Questions ==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>There <del style="font-weight: bold; text-decoration: none;">will be </del>no empirical paper this week.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>There <ins style="font-weight: bold; text-decoration: none;">is </ins>no empirical paper this week.</div></td></tr>
<!-- diff cache key wiki_communitydata:diff::1.12:old-134092:rev-134093 -->
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134092&oldid=prevAaronshaw: /* Statistical Questions (from OpenIntro) */2019-04-03T19:05:38Z<p><span dir="auto"><span class="autocomment">Statistical Questions (from OpenIntro)</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:05, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l17">Line 17:</td>
<td colspan="2" class="diff-lineno">Line 17:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Statistical Questions (from OpenIntro) ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Statistical Questions (from OpenIntro) ==</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>'''Exercises from OpenIntro §2''</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>'''Exercises from OpenIntro §2<ins style="font-weight: bold; text-decoration: none;">'</ins>''</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q0.''' Any questions or clarifications from the OpenIntro text or lecture notes?</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q0.''' Any questions or clarifications from the OpenIntro text or lecture notes?</div></td></tr>
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134091&oldid=prevAaronshaw: /* Programming Challenges */2019-04-03T19:03:44Z<p><span dir="auto"><span class="autocomment">Programming Challenges</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:03, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l8">Line 8:</td>
<td colspan="2" class="diff-lineno">Line 8:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like.</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>:'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named "x" in your dataset. Write your own function to <del style="font-weight: bold; text-decoration: none;">re-compute </del>the median of "x". Be ready to walk us through how your function works!</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>:'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named "x" in your dataset. Write your own function to <ins style="font-weight: bold; text-decoration: none;">recalculate </ins>the median of "x". Be ready to walk us through how your function works!</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC6.''' Compare the vector from Week 2 with the first column (<code>x</code>) of the Week 3 data frame. They should be similar, but how similar? Write R code to demonstrate or support your answer.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC6.''' Compare the vector from Week 2 with the first column (<code>x</code>) of the Week 3 data frame. They should be similar, but how similar? Write R code to demonstrate or support your answer.</div></td></tr>
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134090&oldid=prevAaronshaw: /* Programming Challenges */2019-04-03T19:03:01Z<p><span dir="auto"><span class="autocomment">Programming Challenges</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:03, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l7">Line 7:</td>
<td colspan="2" class="diff-lineno">Line 7:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>::'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this is class, but I do recommend it.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>::'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this is class, but I do recommend it.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for <del style="font-weight: bold; text-decoration: none;">all </del>of the variables to get a sense of what they look like.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for <ins style="font-weight: bold; text-decoration: none;">each </ins>of the variables to get a sense of what they look like.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named "x" in your dataset. Write your own function to re-compute the median of "x". Be ready to walk us through how your function works!</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named "x" in your dataset. Write your own function to re-compute the median of "x". Be ready to walk us through how your function works!</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data).</div></td></tr>
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134089&oldid=prevAaronshaw: /* Programming Challenges */2019-04-03T19:01:44Z<p><span dir="auto"><span class="autocomment">Programming Challenges</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:01, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l4">Line 4:</td>
<td colspan="2" class="diff-lineno">Line 4:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC0.''' Create a new project and RMarkdown script for this week's problem set (as usual).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC0.''' Create a new project and RMarkdown script for this week's problem set (as usual).</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.''' Revisit your code from last week and recall what group number you were in (should be an integer between 1-20). Navigate to the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course] and download the .csv file in the <code>week_03</code> subdirectory with your group number from PC1 last week associated with it (e.g., <code>group_<output>.<del style="font-weight: bold; text-decoration: none;">Rdata</del></code>). </div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.''' Revisit your code from last week and recall what group number you were in (should be an integer between 1-20). Navigate to the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course] and download the .csv file in the <code>week_03</code> subdirectory with your group number from PC1 last week associated with it (e.g., <code>group_<output>.<ins style="font-weight: bold; text-decoration: none;">csv</ins></code>)<ins style="font-weight: bold; text-decoration: none;">. Note that it is a .csv file and not an .RData file</ins>. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>::'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this is class, but I do recommend it.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>::'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this is class, but I do recommend it.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td></tr>
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134088&oldid=prevAaronshaw: /* Programming Challenges */2019-04-03T19:00:56Z<p><span dir="auto"><span class="autocomment">Programming Challenges</span></span></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:00, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l5">Line 5:</td>
<td colspan="2" class="diff-lineno">Line 5:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC0.''' Create a new project and RMarkdown script for this week's problem set (as usual).</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC0.''' Create a new project and RMarkdown script for this week's problem set (as usual).</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.''' Revisit your code from last week and recall what group number you were in (should be an integer between 1-20). Navigate to the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course] and download the .csv file in the <code>week_03</code> subdirectory with your group number from PC1 last week associated with it (e.g., <code>group_<output>.Rdata</code>). </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.''' Revisit your code from last week and recall what group number you were in (should be an integer between 1-20). Navigate to the [https://communitydata.cc/~ads/teaching/2019/stats/data data repository for the course] and download the .csv file in the <code>week_03</code> subdirectory with your group number from PC1 last week associated with it (e.g., <code>group_<output>.Rdata</code>). </div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>:'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this is class, but I do recommend it.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">:</ins>:'''PC1.5''' Open the dataset and take a look at it! You might use spreadsheet software (e.g., Google docs, LibreOffice, Excel, etc.) to do this, or it is a good idea to open it in a text editor (e.g., NotePad) so you can inspect the structure of the "raw data." Manually inspecting the raw data is common and useful since it can help you figure out how best to read it into R. I won't ask about this is class, but I do recommend it.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. </div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for all of the variables to get a sense of what they look like.</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for all of the variables to get a sense of what they look like.</div></td></tr>
<!-- diff cache key wiki_communitydata:diff::1.12:old-134087:rev-134088 -->
</table>Aaronshawhttps://wiki.communitydata.science/index.php?title=Statistics_and_Statistical_Programming_(Spring_2019)/Problem_Set:_Week_3&diff=134087&oldid=prevAaronshaw at 19:00, 3 April 20192019-04-03T19:00:46Z<p></p>
<table style="background-color: #fff; color: #202122;" data-mw="interface">
<col class="diff-marker" />
<col class="diff-content" />
<col class="diff-marker" />
<col class="diff-content" />
<tr class="diff-title" lang="en">
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">← Older revision</td>
<td colspan="2" style="background-color: #fff; color: #202122; text-align: center;">Revision as of 20:00, 3 April 2019</td>
</tr><tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l1">Line 1:</td>
<td colspan="2" class="diff-lineno">Line 1:</td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>Please note: if you have trouble loading up your dataset ('''PC2''' below) contact Jeremy or me <del style="font-weight: bold; text-decoration: none;">in the next day or so </del>as you will only be able to do the other challenges once you've done that one.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>Please note: if you have trouble loading up your dataset ('''PC2''' below) contact Jeremy or me <ins style="font-weight: bold; text-decoration: none;">ASAP </ins>as you will only be able to do the other challenges once you've done that one.</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Programming Challenges ==</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>== Programming Challenges ==</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l16">Line 16:</td>
<td colspan="2" class="diff-lineno">Line 16:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC10.''' As always, Save your work and archive the project (i.e., in a .zip file) and [https://canvas.northwestern.edu/courses/90927/assignments/578012 upload it to canvas].</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>:'''PC10.''' As always, Save your work and archive the project (i.e., in a .zip file) and [https://canvas.northwestern.edu/courses/90927/assignments/578012 upload it to canvas].</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>== Statistical Questions ==</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== Statistical Questions <ins style="font-weight: bold; text-decoration: none;">(from OpenIntro) </ins>==</div></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">=== </del>Exercises from OpenIntro §2 <del style="font-weight: bold; text-decoration: none;">===</del></div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div><ins style="font-weight: bold; text-decoration: none;">'''</ins>Exercises from OpenIntro §2<ins style="font-weight: bold; text-decoration: none;">''</ins></div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q0.''' Any questions or clarifications from the OpenIntro text or lecture notes?</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q0.''' Any questions or clarifications from the OpenIntro text or lecture notes?</div></td></tr>
<tr><td colspan="2" class="diff-lineno" id="mw-diff-left-l25">Line 25:</td>
<td colspan="2" class="diff-lineno">Line 25:</td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q4.''' Exercise 3.32 on arachnophobia (spiders are frequent concern in statistical programming)</div></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><div>: '''Q4.''' Exercise 3.32 on arachnophobia (spiders are frequent concern in statistical programming)</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div><del style="font-weight: bold; text-decoration: none;">=</del>== Empirical Paper <del style="font-weight: bold; text-decoration: none;">=</del>==</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>== Empirical Paper <ins style="font-weight: bold; text-decoration: none;">Questions </ins>==</div></td></tr>
<tr><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td><td class="diff-marker"></td><td style="background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;"><br/></td></tr>
<tr><td class="diff-marker" data-marker="−"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;"><div>There will be no empirical paper this week<del style="font-weight: bold; text-decoration: none;">. Understanding probability distributions is fundamental to statistics but few people really end there so it's hard to find a paper that is just about this</del>.</div></td><td class="diff-marker" data-marker="+"></td><td style="color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;"><div>There will be no empirical paper this week.</div></td></tr>
</table>Aaronshaw