CommunityData:Message Walls

= Useful Resources =


 * Notes on Wikia Dumps CommunityData:Wikia Dumps
 * Notes on the code CommunityData:Message Walls Code

= Task Management =

Next Steps (Sept 20)

 * (Nate) Convert all dates to lubridate

Next Steps (Aug 24)

 * (Sneha) Add list of variables to build to the wiki
 * (Salt) Verify whether the wiki dumps are solid
 * (Nate) Generate 'contributor experience' variables for every edit in the dataset
 * (Nate) Generate bot and admin data for the larger wiki dataset

Next Steps (Aug 15)

 * (Salt) Edit build wikilist code to map filenames with message wall transition dates
 * (Sneha) Continue preliminary analysis with 25 wikis
 * (Nate) Continue investigating what dumps we can get from wikiteam

Next Steps (Aug 1)

 * (Salt) Make file with mapping between urls and the newly scraped dumps.
 * (Nate with Mako's help) Figure out what's going on in the wiki mapping code
 * (Sneha) Plan for visit in September
 * (Sneha) Continue preliminary analysis with 25 wikis

Retreat Tasks

 * Document and organize the git repository.
 * Data exploration / preliminary analysis.

Next Steps (July 18)

 * (Salt with Nate's help) Add new dumps to wikilist
 * (Nate) Update wikiteam mapping.
 * (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
 * (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
 * (Sneha) create exploratory plots for a larger set of wikis of different sizes.
 * (Sneha) Request new dumps for missing wikis.

Next Steps (July 11)

 * (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
 * (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
 * (Sneha) create exploratory plots for a larger set of wikis of different sizes.
 * (Sneha) Request new dumps for missing wikis.
 * (Salt) Download wikis available on Special:statistics. Done
 * (Nate) Scrape admin and bot edits using a script from Mako. Done

Next Steps (June 27th)

 * (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
 * (Sneha) Take a look namespaces 1200-1202 to understand what they mean. Done
 * (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
 * (Sneha) create exploratory plots for a larger set of wikis of different sizes.
 * (Salt) Download wikis available on Special:statistics.
 * (Salt) Request new dumps for missing wikis.
 * (Nate) Scrape admin and bot edits using a script from Mako.
 * (Nate) Finish identifying wikiteam mapping Done.

Next Steps (June 20th)

 * (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
 * (Nate) Get muppet wiki Dr. Horrible Wiki edit weeks for Sneha (Done)
 * (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
 * (Sneha) Will play with Dr. Horrible data (Done)
 * (Sneha) create list of subsetting characteristics for study

Next Steps (June 13th)

 * Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
 * Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)


 * Do analysis of alt history wiki and update (Sneha)


 * Create list of criteria to identify wikis we want to use in this study (Sneha)

Next Steps (June 6th)

 * Identify list of Wikis we will analyze from the tsv file.


 * Attempt to obtain a good dump for each of these wikis. See CommunityData:Wikia Dumps for information.


 * This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under.
 * Modify Wikiq to give an error message if the closing tag is missing.


 * Sneha to take a look althistory data from Nate.


 * Nate will write a version of build_edit_weeks for the message wall project


 * Check back next meeting Tuesday (June 13th)