Data Into Insights (Spring 2021)/Visualization Project
For this project, you will work on creating a visualization of a dataset that is both beautiful and surfaces an insight about the data.
Get the data
I have identified a few fairly straightforward datasets that I'd like you to choose from:
- Palmer Penguin data - this is a dataset of measurements of 344 penguins living in Antarctica
- Star Wars Characters - this is a dateset of attributes of 87 characters from the Star Wars universe
Both datasets are designed for easy import into R. To install the penguin data, you just need to run:
The data will be loaded as a tibble and saved to the variable
The Star Wars data is even easier to get. When you run
library(tidyverse) it will make the data available in a tibble saved to
Explore the data and identify a story
Look through the data. Both of these datasets have descriptions of each of their columns (in
?starwars, respectively). Once you understand the dataset you want to work with consider different ways of displaying or summarizing the data. Consider what "story" could be told through different ways of visualizing the data and think carefully about a story that you want to tell.
Then, make a plan for how you want to tell that story through a visualization. At this point, it may be helpful to make a sketch of what you want the visualization to look like.
Create a visualization
Using ggplot, create a visualization. Then, consider the principles of CRAP and Kieran Healy's principles for visualization and identify one way that the visualization could be improved (e.g., through changing the colors, the labels, or the axes). Make that improvement.
Then, take this new version and see if you can see ways that it can be improved. Keep iterating until you have built something that you are proud of.
Write a memo
I want you to do this whole process in an R Markdown file. At each step, explain the decisions that you are making and why you are making them. Ideally, you will have a document that captures your process of creating and refining a visualization, along with how you decided to take each step. Then, at the end I want you to summarize what you did in a few paragraphs, focused on the following questions:
- What is the story that your visualization is telling?
- How did you apply the principles from the class to improve your visualization? Cite specific examples.
In sum, what I want you to produce is a number of iterations on a single visualization from one of these datasets, together with an explanation of why you chose that visualization, what it tells us, and how you used the principles from the class to help guide your decisions.
Knit your file into a Doc file and submit it on Brightspace.
Here are very simple examples of loading each dataframe and creating a simple, ugly visualization. I expect you to build much more beautiful and insightful visualizations but hopefully these get you pointed in the right direction.
# install.packages('palmerpenguins') ## Only need to run this once library(palmerpenguins) penguins %>% group_by(year) %>% summarize(mean_body_mass = mean(body_mass_g, na.rm = T))%>% ggplot() + geom_line(aes(x=year, y = mean_body_mass)) library(tidyverse) starwars %>% ggplot(aes(color = gender, y = height)) + geom_boxplot()
I hope that you help each other to troubleshoot and find bugs. I want each person to come up with their own visualization and to make their own decisions about how to improve it, but feel free to reach out to each other (including on the #homework-help channel on Discord) as you run into roadblocks.