Communication and Social Networks (Spring 2021)/Dutch School Data Visualization challenge: Difference between revisions

From CommunityData
 
(4 intermediate revisions by the same user not shown)
Line 3: Line 3:
In 2003 and 2004, researchers repeatedly surveyed a number of Dutch school students about their friendships and their behavior. They were particularly interested in the relationship between friendships and drinking behavior. They recorded information about alcohol use, gender, age, ethnicity (whether Dutch or not), and religion.
In 2003 and 2004, researchers repeatedly surveyed a number of Dutch school students about their friendships and their behavior. They were particularly interested in the relationship between friendships and drinking behavior. They recorded information about alcohol use, gender, age, ethnicity (whether Dutch or not), and religion.


For this homework, you are supposed to think of a question that you could ask about this data. I don't remember exactly the questions that we came up with in class, but you could ask things like:
For this project, you will think of a question that you would like to visualize with this network. For example:
* Are people who drink more more popular?
* Are people who drink more more popular?
* Are males or females more likely to have the same drinking behavior as their friends?
* Are males or females more likely to have the same drinking behavior as their friends?
* Are people of the dominant religion more likely to be popular? More likely to be friends with each other?
* Are people of the dominant religion more likely to be popular? More likely to be friends with each other?


I created two files to help you to get started:
First, I want you to think about a question and draw on a piece of paper what you want the outcome to look like. For example, if you want to visualize whether people who drink are more popular, you may decide to color nodes by in-degree and change their size or shape based on drinking behavior.


# '''[https://github.com/jdfoote/Communication-and-Social-Networks/raw/master/activities/network_visualization_examples_and_assignment.Rmd This link]''' is an R Markdown file that gives general examples of how to create network visualizations, and gives information about the data. Right-click the file, save it to your computer, and open it in RStudio. At the top of RStudio click "knit", and it should open up something that looks kind of like a web page, which was created from this file ([https://youtu.be/tKUufzpoHDE video explaining R Markdown]).
Second, you will do your best to recreate your idea using tidygraph and ggraph. I would like you to turn in both your drawing and your visualization.
# [https://github.com/jdfoote/Communication-and-Social-Networks/raw/master/activities/school_data_example.Rmd This R file] shows an A+ example of this assignment. It shows how to load the code, gives you visualization ideas, and some code that you might want to alter for your assignment. As with the other one, you should be able to right-click it, save it, and open it in RStudio. I explain the code in [https://youtu.be/prCmVEUTxQE this video].


There are lots of different questions that you can ask about this data, and lots of different ways to visualize relationships between them. Your goal is to identify a question that you think would be interesting and to use R to visualize the network in a way that sheds light on that question. In my example, I decided to look at whether friendships which were mutual were more likely to have the same drinking behavior. I ended up coloring the nodes based on drinking behavior and coloring the edges based on whether they had the same drinking behavior.
You are welcome to work with a partner if you would like. Just make it clear who you worked with and I would encourage you to be a bit more ambitious in what you try to do.


If you wanted to visualize whether drinkers were more popular, you might color nodes by drinking behavior, and change their size based on their indegree centrality or eigenvector centrality.


== The data ==
=== Resources ===


The R Markdown file linked above explains that I created 2 igraph objects for you:
The [https://jeremydfoote.com/Communication-and-Social-Networks/week_6/ggraph_walkthrough.html Introduction to ggraph and tidygraph reading] actually uses this dataset. You can look to that for examples to build on (Here is the [https://jeremydfoote.com/Communication-and-Social-Networks/week_6/ggraph_walkthrough.Rmd R Markdown file] that I used to create the web page).


* <code>G</code> is a multiplex network, which includes both friendships and edges which represent whether two people went to grade school together
Data:
* <code>friend_net</code> is just a simplified version of <code>G</code>, where I removed the grade school edges.
* [https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_nodes.csv Node data]
 
* [https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_edges.csv Edge data]
In order to load these igraph objects into R you will need to run
 
<code>load(url('https://github.com/jdfoote/Communication-and-Social-Networks/raw/master/activities/school_graph.Rdata'))</code>.
 
This should grab the igraph objects <code>G</code> and <code>friend_net</code>, and load them into your environment. Descriptions of both networks are in the R Markdown file.


Descriptions of what each measure means are at [http://www.stats.ox.ac.uk/~snijders/siena/tutorial2010_data.htm this site], maintained by the people who collected the data.
Descriptions of what each measure means are at [http://www.stats.ox.ac.uk/~snijders/siena/tutorial2010_data.htm this site], maintained by the people who collected the data.


== Troubleshooting ==
To import the data you can right-click on and save the edge and node data files above to your computer and then import them into R.
 
Note that you may need to install the following packages to get my scripts to work:


<code>
Alternatively, the following code will download the files and create a graph object. You are welcome to reuse it.
install.packages('igraph')
<syntaxhighlight lang="R">
install.packages('tidygraph') # Only for the second script
nodes = read_csv('https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_nodes.csv')
</code>
edges = read_csv('https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_edges.csv')


This will install these libraries on your computer, so that you can use them
G = graph_from_data_frame(d=edges, v=nodes) %>% as_tbl_graph()
</syntaxhighlight>

Latest revision as of 20:30, 11 March 2021

The goal[edit]

In 2003 and 2004, researchers repeatedly surveyed a number of Dutch school students about their friendships and their behavior. They were particularly interested in the relationship between friendships and drinking behavior. They recorded information about alcohol use, gender, age, ethnicity (whether Dutch or not), and religion.

For this project, you will think of a question that you would like to visualize with this network. For example:

  • Are people who drink more more popular?
  • Are males or females more likely to have the same drinking behavior as their friends?
  • Are people of the dominant religion more likely to be popular? More likely to be friends with each other?

First, I want you to think about a question and draw on a piece of paper what you want the outcome to look like. For example, if you want to visualize whether people who drink are more popular, you may decide to color nodes by in-degree and change their size or shape based on drinking behavior.

Second, you will do your best to recreate your idea using tidygraph and ggraph. I would like you to turn in both your drawing and your visualization.

You are welcome to work with a partner if you would like. Just make it clear who you worked with and I would encourage you to be a bit more ambitious in what you try to do.


Resources[edit]

The Introduction to ggraph and tidygraph reading actually uses this dataset. You can look to that for examples to build on (Here is the R Markdown file that I used to create the web page).

Data:

Descriptions of what each measure means are at this site, maintained by the people who collected the data.

To import the data you can right-click on and save the edge and node data files above to your computer and then import them into R.

Alternatively, the following code will download the files and create a graph object. You are welcome to reuse it.

nodes = read_csv('https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_nodes.csv')
edges = read_csv('https://raw.githubusercontent.com/jdfoote/Communication-and-Social-Networks/spring-2021/resources/school_graph_edges.csv')

G = graph_from_data_frame(d=edges, v=nodes) %>% as_tbl_graph()