# Editing Statistics and Statistical Programming (Spring 2019)/Problem Set: Week 3

Jump to navigation
Jump to search

**Warning:** You are not logged in. Your IP address will be publicly visible if you make any edits. If you **log in** or **create an account**, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.

Latest revision | Your text | ||

Line 8: | Line 8: | ||

:'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. | :'''PC2.''' Read the CSV file into R using the <code>read.csv()</code> command. | ||

:'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like. | :'''PC3.''' Get to know your data! Do whatever is necessary to summarize the new dataset. How many columns and rows are there? Report appropriate summary statistics for each variable (e.g., what are the ranges, minimums, maximums, means, medians, and standard deviations of the continuous variables?). Plot histograms for each of the variables to get a sense of what they look like. | ||

− | :'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named | + | :'''PC4.''' Use the <code>my.mean()</code> function distributed in this week's R lecture materials to recalculate the mean of the variable (column) named "x" in your dataset. Write your own function to recalculate the median of "x". Be ready to walk us through how your function works! |

:'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data). | :'''PC5.''' Load your vector from Week 2 again and perform the same cleanup steps you did in PC6 and PC7 last week (recode negative values as missing and log-transform the data). | ||

:'''PC6.''' Compare the vector from Week 2 with the first column (<code>x</code>) of the Week 3 data frame. They should be similar, but how similar? Write R code to demonstrate or support your answer. | :'''PC6.''' Compare the vector from Week 2 with the first column (<code>x</code>) of the Week 3 data frame. They should be similar, but how similar? Write R code to demonstrate or support your answer. | ||

− | :'''PC7.''' Visualize the Week 3 data using <code>ggplot2</code> and the <code>geom_point()</code> function to produce a scatterplot. First, plot <code>x</code> on the x-axis and <code>y</code> on the y-axis. Second, visualize | + | :'''PC7.''' Visualize the Week 3 data using <code>ggplot2</code> and the <code>geom_point()</code> function to produce a scatterplot. First, plot the <code>x</code> on the x-axis and <code>y</code> on the y-axis. Second, visualize i, j, and k on other dimensions (e.g., color, shape, and size seem reasonable). If you run into any issues plotting these dimensions, consider that <code>ggplot2</code> can be very picky about the classes of objects... |

:'''PC8.''' A very common step when you import and prepare for data analysis is going to be cleaning and recoding data. Some of that is needed here. It turns out that the variables <code>i</code> and <code>j</code> are really dichotomous "true/false" variables that have been coded as 0 and 1 in this dataset. Recode these columns as <code>logical</code> (i.e., "TRUE" or "FALSE" values). The variable <code>k</code> is really a categorical variable. Recode this as a factor and change the numbers into the following levels: 0="none", 1="some", 2="lots", 3="all". The goal is to end up with a factor where those text strings are the levels of the factor. | :'''PC8.''' A very common step when you import and prepare for data analysis is going to be cleaning and recoding data. Some of that is needed here. It turns out that the variables <code>i</code> and <code>j</code> are really dichotomous "true/false" variables that have been coded as 0 and 1 in this dataset. Recode these columns as <code>logical</code> (i.e., "TRUE" or "FALSE" values). The variable <code>k</code> is really a categorical variable. Recode this as a factor and change the numbers into the following levels: 0="none", 1="some", 2="lots", 3="all". The goal is to end up with a factor where those text strings are the levels of the factor. | ||

:'''PC9.''' Now that you have cleaned and recoded your data, summarize those three variables again. Also, go back and regenerate the visualizations from PC7. How have the plots changed (if at all)? | :'''PC9.''' Now that you have cleaned and recoded your data, summarize those three variables again. Also, go back and regenerate the visualizations from PC7. How have the plots changed (if at all)? |