CommunityData:Wikia rises and declines tasks

= Issues =

Namespace 4 is a very coarse measure for the amount of governance activity.

 * Option 0 : treat all of namespace 4 equally for all wikis (minimal assumption, maximum generalizable, more unknown measurement error, minimal work).
 * Option 1 : treat WP:ANI separately for English Wikipedia (easily done, opens can of worms)
 * Option 2 : use translation to painstakingly identify WP:ANI in other language wikipedias.
 * Option 3 : Do a different project that uses templates and patterns of use to identify organizational routines on different language editions of WP.

Go with Option 0 for now since it treats alll Wikis the same without beginning a new engineering project.

= Build Dataset =
 * 1) Collect list of 2010 wikis with wikiq dumps Done
 * 2) Scrape bot data for wikis Done and add to tables In progress
 * 3) build dataset with variables
 * 4) newcomer 1st edit session
 * 5) is reverted Done
 * 6) is reverted and messaged Done
 * 7) is reverted and messaged on article talk Done
 * 8) is messaged Done
 * 9) number of edits on wikia overall Done
 * 10) number of edits on wiki Done
 * 11) has edit other wikia wikis Done
 * 12) Survives (makes an edit 2-6 months after first session) Done


 * 1) bots
 * 2) ask mako for script Done
 * 3) tool reverts Done
 * 4) change in tool reverts Done


 * 1) Wiki level rules
 * 2) number of namespace 4 editors Done
 * 3) number of namespace 4 edits Done
 * 4) change in namespace 4 page length Done
 * 5) age of namespace 4 editors Done
 * 6) change in newcomer revert rate Done
 * 7) change in newcomer revert rate without talk page discussion Done
 * 8) change in newcomer revert rate without any message Done
 * 9) newcomer survival rate Done


 * 1) Wiki level controls
 * 2) Active editors Done
 * 3) edits per time Done
 * 4) newcomer edits per time Done
 * 5) number of articles
 * 6) total wiki length Done
 * 7) Wiki age Done


 * 1) Adding Wikipedia Data
 * 2) Download wikipedia dumps Done
 * 3) Download wikipedia userroles data Done
 * 4) Merge large wikipedia dumps Done
 * 5) Integrate wikipedia and wikia data in progress

= Reading =
 * 1) Read Geiger 2012

= Models =
 * 1) Newcomer retention
 * 2) rate of newcomer revert
 * 3) rate of rule making
 * 4) rate of tool assisted revert
 * 5) rate of newcomer messaging (following revert)