Data Into Insights (Spring 2021)/Visualization Project

From CommunityData

The Goal

For this project, you will work on creating a visualization of a dataset that is both beautiful and surfaces an insight about the data.

Instructions

Get the data

I have identified a few fairly straightforward datasets that I'd like you to choose from:

Both datasets are designed for easy import into R. To install the penguin data, you just need to run:

install.packages("palmerpenguins")
library(palmerpenguins)

The data will be loaded as a tibble and saved to the variable penguins.

The Star Wars data is even easier to get. When you run library(tidyverse) it will make the data available in a tibble saved to starwars

Explore the data and identify a story

Look through the data. Both of these datasets have descriptions of each of their columns (in ?penguins and ?starwars, respectively). Once you understand the dataset you want to work with consider different ways of displaying or summarizing the data. Consider what story could be told through different ways of visualizing the data and think carefully about a story that you want to tell.

Then, make a plan for how you want to tell that story. At this point, it may be helpful to make a sketch of what you want the visualization to look like.

Create a visualization

Using ggplot, create a visualization. Then, consider the principles of CRAP and Kieran Healy's principles for visualization and identify one way that the visualization could be improved (e.g., through changing the colors, the labels, or the axes). Make that improvement.

Then, take this new version and see if you can see ways that it can be improved. Keep iterating until you have built something that you are proud of.

Write a memo

I want you to do this whole process in an R Markdown file. At each step, explain the decisions that you are making and why you are making them. Ideally, you will have a document that captures your process of creating and refining a visualization, along with how you decided to take each step. Then, at the end I want you to summarize what you did in a few paragraphs, focused on the following questions:

  • What is the story that your visualization is telling?
  • How did you apply the principles from the class to improve your visualization? Cite specific examples.

Submit

Knit your file into an HTML or Doc file and submit it on Brightspace.


Sample Code

Here are very simple examples of loading each dataframe and creating a simple, ugly visualization. I expect you to build much more beautiful and insightful visualizations but hopefully these get you pointed in the right direction.

# install.packages('palmerpenguins') ## Only need to run this once
library(palmerpenguins)

penguins %>%
  group_by(year) %>%
  summarize(mean_body_mass = mean(body_mass_g, na.rm = T))%>%
  ggplot() +
  geom_line(aes(x=year, y = mean_body_mass))
 

library(tidyverse)

starwars %>%
  ggplot(aes(color = gender, y = height)) +
  geom_boxplot()


Getting Help

I hope that you help each other to troubleshoot and find bugs. I want each person to come up with their own visualization and to make their own decisions about how to improve it, but feel free to reach out to each other (including on the #homework-help channel on Discord) as you run into roadblocks.