Editing CommunityData:Message Walls

From CommunityData

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 4: Line 4:
* Notes on the code -- Now with a diagram! [[CommunityData:Message Walls Code]]
* Notes on the code -- Now with a diagram! [[CommunityData:Message Walls Code]]


= Robustness Checks =
= Task Management =  
* Pre-period matching placebo test
 
* Normal placebo test
==Overview==
 
*By first week of October, we have a complete dataset
*By end of October, we build all variables, and start analysis
*By end of November, we conduct and finish data analysis
*Sneha gets a lot of writing done during dissertation boot camp (Dec 4-15)
*By January, have a solid draft ready for CSCW second round
 
==Next Steps (Nov 30)==
*(Nate) Update code diagram to say we're using wikiList.3.csv
*(


= Task Management =


==Overview ==
==Next Steps (Oct 5)==
 
* (Salt) Verify which dumps are good
* (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output
* (Sneha) Run inclusion criteria code
* (Sneha) Update code diagram
 
==Next Steps (Sept 28)==
* (Salt) Verify which dumps are good
* (Salt) Rerun wikiq to get encoded urls DONE
* (Salt) Write R code to define edits as either newcomer or non-newcomer
* (Nate) Finish refactoring build wiki list code (DONE)
* (Nate) Make code architecture diagram (DONE)
* (Nate) Continue work on bot and admin scraper (DONE)
* (Nate, from last week) Convert dates to lubridate (DONE)
* (Sneha) Contact Danny Horn to get information on message wall rollouts
* (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want
 
==Next Steps (Sept 20)==
* (Nate) Convert all dates to lubridate
 
==Next Steps (Aug 24)==
*(Sneha) Add list of variables to build to the wiki
*(Salt) Verify whether the wiki dumps are solid
*(Nate) Generate 'contributor experience' variables for every edit in the dataset
*(Nate) Generate bot and admin data for the larger wiki dataset
 
==Next Steps (Aug 15)==
*(Salt) Edit build wikilist code to map filenames with message wall transition dates
*(Sneha) Continue preliminary analysis with 25 wikis
*(Nate) Continue investigating what dumps we can get from wikiteam
 
==Next Steps (Aug 1)==
*(Salt) Make file with mapping between urls and the newly scraped dumps.
*(Nate with Mako's help) Figure out what's going on in the wiki mapping code
*(Sneha) Plan for visit in September
*(Sneha) Continue preliminary analysis with 25 wikis
 
== Retreat Tasks ==
* Document and organize the git repository.
* Data exploration / preliminary analysis.
 
== Next Steps (July 18) ==
* (Salt with Nate's help) Add new dumps to wikilist
* (Nate) Update wikiteam mapping.
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
* (Sneha) create exploratory plots for a larger set of wikis of different sizes.
* (Sneha) Request new dumps for missing wikis.
 
== Next Steps (July 11) ==
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
* (Sneha) create exploratory plots for a larger set of wikis of different sizes.
* (Sneha) Request new dumps for missing wikis.
* (Salt) Download wikis available on Special:statistics. '''Done'''
* (Nate) Scrape admin and bot edits using a script from Mako. '''Done'''
 
== Next Steps (June 27th) ==
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
* (Sneha) Take a look namespaces 1200-1202 to understand what they mean. '''Done'''
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
* (Sneha) create exploratory plots for a larger set of wikis of different sizes.
* (Salt) Download wikis available on Special:statistics.
* (Salt) Request new dumps for missing wikis.
* (Nate) Scrape admin and bot edits using a script from Mako.
* (Nate) Finish identifying wikiteam mapping '''Done'''.
 
==Next Steps (June 20th)==
* (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
* (Nate) Get <strike>muppet wiki</strike> Dr. Horrible Wiki edit weeks for Sneha (Done)
* (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
* (Sneha) Will play with Dr. Horrible data (Done)
* (Sneha) create list of subsetting characteristics for study
 
==Next Steps (June 13th)==
 
* Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
* Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
 
* Do analysis of alt history wiki and update (Sneha)
 
* Create list of criteria to identify wikis we want to use in this study (Sneha)
 
== Next Steps (June 6th)==  


(Updated March 15th)
* Identify list of Wikis we will analyze from the tsv file. 


===Get missing wikis===
* Attempt to obtain a good dump for each of these wikis. See [[CommunityData:Wikia Dumps]] for information.  
*'''ASAP''' Need to use wikilist3.csv to determine which wikis we don't have - Salt (with Mako's help)
*'''ASAP''' Download the rest and put them through wikiq and build edit weeks - Salt (with Mako's help)


===Analysis===
** This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>. 
* Another meeting with full team to go over the results and try to make sense of them (after Sneha takes a first stab)
* Determine any other models we want to run
* Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.


===Writing===
* Sneha to take a look althistory data from Nate
* Switch from Haythornwaite to Reader to Leader framing (Sneha)
* knitr integration (Sneha + Nate)
* plots (Salt)
* Better pictures of message walls (Sneha)
* Better explanations of why talk pages suck (Sneha)
* Zotero streamlining


== Archive ==
* Nate will write a version of build_edit_weeks for the message wall project


* [[/Archived_tasks|Past next steps]]
* Check back next meeting Tuesday (June 13th)
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see CommunityData:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:

Cancel Editing help (opens in new window)