Data Into Insights (Spring 2021)/Visualization Project

From CommunityData

The Goal[edit]

For this project, you will work on creating a visualization of a dataset that is both beautiful and surfaces an insight about the data.


Get the data[edit]

I have identified a few fairly straightforward datasets that I'd like you to choose from:

Both datasets are designed for easy import into R. To install the penguin data, you just need to run:


The data will be loaded as a tibble and saved to the variable penguins.

The Star Wars data is even easier to get. When you run library(tidyverse) it will make the data available in a tibble saved to starwars

Explore the data and identify a story[edit]

Look through the data. Both of these datasets have descriptions of each of their columns (in ?penguins and ?starwars, respectively). Once you understand the dataset you want to work with consider different ways of displaying or summarizing the data. Consider what "story" could be told through different ways of visualizing the data and think carefully about a story that you want to tell.

Then, make a plan for how you want to tell that story through a visualization. At this point, it may be helpful to make a sketch of what you want the visualization to look like.

Create a visualization[edit]

Using ggplot, create a visualization. Then, consider the principles of CRAP and Kieran Healy's principles for visualization and identify one way that the visualization could be improved (e.g., through changing the colors, the labels, or the axes). Make that improvement.

Then, take this new version and see if you can see ways that it can be improved. Keep iterating until you have built something that you are proud of.

Write a memo[edit]

I want you to do this whole process in an R Markdown file. At each step, explain the decisions that you are making and why you are making them. Ideally, you will have a document that captures your process of creating and refining a visualization, along with how you decided to take each step. Then, at the end I want you to summarize what you did in a few paragraphs, focused on the following questions:

  • What is the story that your visualization is telling?
  • How did you apply the principles from the class to improve your visualization? Cite specific examples.


In sum, what I want you to produce is a number of iterations on a single visualization from one of these datasets, together with an explanation of why you chose that visualization, what it tells us, and how you used the principles from the class to help guide your decisions.

Knit your file into a Doc file and submit it on Brightspace.

Sample Code[edit]

Here are very simple examples of loading each dataframe and creating a simple, ugly visualization. I expect you to build much more beautiful and insightful visualizations but hopefully these get you pointed in the right direction.

# install.packages('palmerpenguins') ## Only need to run this once

penguins %>%
  group_by(year) %>%
  summarize(mean_body_mass = mean(body_mass_g, na.rm = T))%>%
  ggplot() +
  geom_line(aes(x=year, y = mean_body_mass))


starwars %>%
  ggplot(aes(color = gender, y = height)) +

Getting Help[edit]

I hope that you help each other to troubleshoot and find bugs. I want each person to come up with their own visualization and to make their own decisions about how to improve it, but feel free to reach out to each other (including on the #homework-help channel on Discord) as you run into roadblocks.