CommunityData:Message Walls/Archived tasks: Difference between revisions

From CommunityData
No edit summary
No edit summary
Line 1: Line 1:
(December 5 update)
===Data collection & analysis===
*Get missing wikis:
** Salt will use wikiList.3.csv to find which wikis we don't have.
** Salt will email Mako the list, and he will ask the Spaniards whether they have them
** Collect any others we need (Salt, Mako)
*By Dec 6: Gather data for all wikis that made the switch within the first wave of migrations to msg walls (including re-running wikiq, new wikilist, and so on.) (Nate)
** Partition Danny Horn csv into train and test sets (Nate) ['''done''']
*Write down preliminary inclusion criteria for analysis (Sneha, Nate) ['''done''']
*By Dec 12: Implement current inclusion criteria for analysis on training set ['''done''']
** Update existing READMEs, and write codebook describing all variables in detail (Sneha) ['''done''']
* By Jan 25 - Debug editweeks code/identify source of pre-cutoff edits (Nate)
* By Jan 25 - Start writing analysis code
* By Feb 15 - Get initial results, make headway on written draft
* Early April - run analysis on test set
===Writing===
*By Dec 15 - Convert draft of framing, literature review and methodology sections to ACM format (Sneha)
*Week of Jan 8 - Check-in meeting to discuss preliminary results (Sneha)
*By Jan 15 - Draft results section of the paper, identify changes to be made to framing/intro based on results (Sneha)
*By Jan 31 - Complete first draft of paper (Sneha)
* April 16 - CSCW abstract + metadata deadline
* April 19 - CSCW submission deadline
==Next Steps (Nov 30)==
*(Nate) Update code diagram to say we're using wikiList.3.csv
*(Nate) Write python code to create training set
*(Nate and Salt) Link missing dumps to online dump sources
*(Sneha) Write R code for analyzing data at the edit weeks level
==Next Steps (Oct 5)==
==Next Steps (Oct 5)==



Revision as of 18:28, 15 March 2018

(December 5 update)

Data collection & analysis

  • Get missing wikis:
    • Salt will use wikiList.3.csv to find which wikis we don't have.
    • Salt will email Mako the list, and he will ask the Spaniards whether they have them
    • Collect any others we need (Salt, Mako)
  • By Dec 6: Gather data for all wikis that made the switch within the first wave of migrations to msg walls (including re-running wikiq, new wikilist, and so on.) (Nate)
    • Partition Danny Horn csv into train and test sets (Nate) [done]
  • Write down preliminary inclusion criteria for analysis (Sneha, Nate) [done]
  • By Dec 12: Implement current inclusion criteria for analysis on training set [done]
    • Update existing READMEs, and write codebook describing all variables in detail (Sneha) [done]
  • By Jan 25 - Debug editweeks code/identify source of pre-cutoff edits (Nate)
  • By Jan 25 - Start writing analysis code
  • By Feb 15 - Get initial results, make headway on written draft


  • Early April - run analysis on test set

Writing

  • By Dec 15 - Convert draft of framing, literature review and methodology sections to ACM format (Sneha)
  • Week of Jan 8 - Check-in meeting to discuss preliminary results (Sneha)
  • By Jan 15 - Draft results section of the paper, identify changes to be made to framing/intro based on results (Sneha)
  • By Jan 31 - Complete first draft of paper (Sneha)
  • April 16 - CSCW abstract + metadata deadline
  • April 19 - CSCW submission deadline

Next Steps (Nov 30)

  • (Nate) Update code diagram to say we're using wikiList.3.csv
  • (Nate) Write python code to create training set
  • (Nate and Salt) Link missing dumps to online dump sources
  • (Sneha) Write R code for analyzing data at the edit weeks level


Next Steps (Oct 5)

  • (Salt) Verify which dumps are good
  • (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output
  • (Sneha) Run inclusion criteria code
  • (Sneha) Update code diagram

Next Steps (Sept 28)

  • (Salt) Verify which dumps are good
  • (Salt) Rerun wikiq to get encoded urls DONE
  • (Salt) Write R code to define edits as either newcomer or non-newcomer
  • (Nate) Finish refactoring build wiki list code (DONE)
  • (Nate) Make code architecture diagram (DONE)
  • (Nate) Continue work on bot and admin scraper (DONE)
  • (Nate, from last week) Convert dates to lubridate (DONE)
  • (Sneha) Contact Danny Horn to get information on message wall rollouts
  • (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want

Next Steps (Sept 20)

  • (Nate) Convert all dates to lubridate

Next Steps (Aug 24)

  • (Sneha) Add list of variables to build to the wiki
  • (Salt) Verify whether the wiki dumps are solid
  • (Nate) Generate 'contributor experience' variables for every edit in the dataset
  • (Nate) Generate bot and admin data for the larger wiki dataset

Next Steps (Aug 15)

  • (Salt) Edit build wikilist code to map filenames with message wall transition dates
  • (Sneha) Continue preliminary analysis with 25 wikis
  • (Nate) Continue investigating what dumps we can get from wikiteam

Next Steps (Aug 1)

  • (Salt) Make file with mapping between urls and the newly scraped dumps.
  • (Nate with Mako's help) Figure out what's going on in the wiki mapping code
  • (Sneha) Plan for visit in September
  • (Sneha) Continue preliminary analysis with 25 wikis

Retreat Tasks

  • Document and organize the git repository.
  • Data exploration / preliminary analysis.

Next Steps (July 18)

  • (Salt with Nate's help) Add new dumps to wikilist
  • (Nate) Update wikiteam mapping.
  • (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
  • (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
  • (Sneha) create exploratory plots for a larger set of wikis of different sizes.
  • (Sneha) Request new dumps for missing wikis.

Next Steps (July 11)

  • (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
  • (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
  • (Sneha) create exploratory plots for a larger set of wikis of different sizes.
  • (Sneha) Request new dumps for missing wikis.
  • (Salt) Download wikis available on Special:statistics. Done
  • (Nate) Scrape admin and bot edits using a script from Mako. Done

Next Steps (June 27th)

  • (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
  • (Sneha) Take a look namespaces 1200-1202 to understand what they mean. Done
  • (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
  • (Sneha) create exploratory plots for a larger set of wikis of different sizes.
  • (Salt) Download wikis available on Special:statistics.
  • (Salt) Request new dumps for missing wikis.
  • (Nate) Scrape admin and bot edits using a script from Mako.
  • (Nate) Finish identifying wikiteam mapping Done.

Next Steps (June 20th)

  • (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
  • (Nate) Get muppet wiki Dr. Horrible Wiki edit weeks for Sneha (Done)
  • (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
  • (Sneha) Will play with Dr. Horrible data (Done)
  • (Sneha) create list of subsetting characteristics for study

Next Steps (June 13th)

  • Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
  • Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
  • Do analysis of alt history wiki and update (Sneha)
  • Create list of criteria to identify wikis we want to use in this study (Sneha)

Next Steps (June 6th)

  • Identify list of Wikis we will analyze from the tsv file.
    • This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>.
  • Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.
  • Sneha to take a look althistory data from Nate.
  • Nate will write a version of build_edit_weeks for the message wall project
  • Check back next meeting Tuesday (June 13th)