Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Project page
Discussion
Edit
View history
Editing
CommunityData:Exposure and Participation Processes
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Past Next steps == === Visualization and testing === I have struggled with figuring out how to visualize the simulations effectively and to make a convincing argument about the data. There are a few fundamental problems that have made this more difficult. # The scales are very different. We are looking at ~200 communities and 9K people, compared to ~78K communities and ~3.5M people on reddit in a given month. I've dealt with this by sampling and then dealing with ratios rather than values. For example, sample 200 subreddits, and then visualize the ratio of posts/sum(posts). Another approach I've considered is getting a sample of 200 subreddits and then only selecting the users who have posted on those subreddits. I haven't done this yet but it would obviously lead to a much more sparse distribution of subreddits per person and might not represent the true distribution well? # This leads to a related problem. Data from reddit only captures people who actually decided to participate in at least one community. The full population of people who could participate is obviously much larger. A simulation which represents this reality should probably have many, many more people than our current simulation and most of them should be non-participants. This also leads to a difficulty in visualization. So far, I have been removing the non-participants so that the comparison starts at the same place for both subreddits and simulated communities but should we? Currently, we do no statistical tests and simply show faceted histograms for each of the simulation conditions, like the following: [[File:exposure_abm_hist_example.png|500px]] '''Possible improvements:''' * Summary statistic ** Many of these problems go away if we can come up with a decent summary statistic for a distribution. We could, e.g., choose something like gini. It then becomes much easier to summarize multiple simulations across parameter levels and to compare them to the gini seen on reddit. *** Like any summary statistic, this can be misleading and very different distributions can have the same gini. * Distribution comparison ** Something like a K-S test would be even better, but again, the difference in the size of the distributions seems to make this tricky? * Quantile measures ** Nate and I did some work thinking about visualizing not just something like gini, but the values across a few quantiles as a summary of the shape of a highly skewed distribution. === Post hoc additions === Another weakness of the project is that it has an unsatisfying conclusion. The distributions kinda sorta look like what we see empirically but kinda don't. One possible way forward is to make the argument that these theories partially explain the higher-level dynamics but that we need to add additional complications in order to provide results that are satisfying. Some additions that seem reasonable: * Modeling communities as having topics and modeling heterogeneity of interest in the topic * Heterogeneity in costs to participate (e.g., representing differences in free time, skills, etc.)
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information