Editing CommunityData:Exposure and Participation Processes

From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 44: Line 44:


# The scales are very different. We are looking at ~200 communities and 9K people, compared to ~78K communities and ~3.5M people on reddit in a given month. I've dealt with this by sampling and then dealing with ratios rather than values. For example, sample 200 subreddits, and then visualize the ratio of posts/sum(posts). Another approach I've considered is getting a sample of 200 subreddits and then only selecting the users who have posted on those subreddits. I haven't done this yet but it would obviously lead to a much more sparse distribution of subreddits per person and might not represent the true distribution well?
# The scales are very different. We are looking at ~200 communities and 9K people, compared to ~78K communities and ~3.5M people on reddit in a given month. I've dealt with this by sampling and then dealing with ratios rather than values. For example, sample 200 subreddits, and then visualize the ratio of posts/sum(posts). Another approach I've considered is getting a sample of 200 subreddits and then only selecting the users who have posted on those subreddits. I haven't done this yet but it would obviously lead to a much more sparse distribution of subreddits per person and might not represent the true distribution well?
# This leads to a related problem. Data from reddit only captures people who actually decided to participate in at least one community. The full population of people who could participate is obviously much larger. A simulation which represents this reality should probably have many, many more people than our current simulation and most of them should be non-participants. This also leads to a difficulty in visualization. So far, I have been removing the non-participants so that the comparison starts at the same place for both subreddits and simulated communities but should we?  
# This leads to a related problem. Data from reddit only captures people who actually decided to participate in at least one community. The full population of people who could participate is obviously much larger. A simulation which represents this reality should probably have many, many more people than our current simulation and most of them should be non-participants. This also leads to a difficulty in visualization. So far, I have been removing the non-participants so that the comparison starts at the same place for both subreddits and simulated communities but should we?  


Line 58: Line 59:
** Something like a K-S test would be even better, but again, the difference in the size of the distributions seems to make this tricky?
** Something like a K-S test would be even better, but again, the difference in the size of the distributions seems to make this tricky?
* Quantile measures
* Quantile measures
** Nate and I did some work thinking about visualizing not just something like gini, but the values across a few quantiles as a summary of the shape of a highly skewed distribution.
** Nate and I did some work thinking about visualizing not just something like gini, but the values across a few quantiles as a summary of the shape of a highly skewed distribution.  
 


=== Post hoc additions ===
=== Post hoc additions ===
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)