Editing Statistics and Statistical Programming (Winter 2021)/Problem set 4
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 16: | Line 16: | ||
# Find the names of all of the variables (columns) as well as the class of each of the variables. | # Find the names of all of the variables (columns) as well as the class of each of the variables. | ||
# Summarize at least one continuous or discrete numeric variable in the dataset. Calculate the length, range (minimum and maximum), mean, and standard deviation. | # Summarize at least one continuous or discrete numeric variable in the dataset. Calculate the length, range (minimum and maximum), mean, and standard deviation. | ||
# Plot a visual summary (maybe a boxplot or a histogram?) for the same numeric variable you used in PC1. | # Plot a visual summary (maybe a boxplot or a histogram?) for the same numeric variable you used in PC1.4 above. | ||
# Summarize at least one categorical variable in the dataset (e.g., if the variable takes values of TRUE/FALSE or NA, how many of each are value are there?). | # Summarize at least one categorical variable in the dataset (e.g., if the variable takes values of TRUE/FALSE or NA, how many of each are value are there?). | ||
Line 29: | Line 29: | ||
# Calculate summary statistics for your variable. Be sure to include the length, minimum, maximum, mean, and standard deviation. | # Calculate summary statistics for your variable. Be sure to include the length, minimum, maximum, mean, and standard deviation. | ||
# Create a visualization of your variable: at the very least, create a boxplot or a histogram. | # Create a visualization of your variable: at the very least, create a boxplot or a histogram. | ||
# Some of you may have negative numbers. | # Some of you may have negative numbers. Whoops! This was due to a coding error (In this case, not really, but lets pretend it is since these types of things are pretty typical). Write code to recode all negative numbers as missing (i.e. <code>NA</code>) in your dataset. Now compute the mean and standard deviation again and note any changes. {{tbd}} | ||
# Log transform your dataset (i.e., take the natural logarithm for each value). If you have very small values (close to zero) it may be helpful to add 1 to each value before you take the natural logarithm (this avoids nonsense output in the results). Calculate the new mean and standard deviation of the transformed variable. Also create a new histogram or boxplot. | # Log transform your dataset (i.e., take the natural logarithm for each value). If you have very small values (close to zero) it may be helpful to add 1 to each value before you take the natural logarithm (this avoids nonsense output in the results). Calculate the new mean and standard deviation of the transformed variable. Also create a new histogram or boxplot. | ||