CommunityData:Message Walls/Archived tasks

From CommunityData
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Next Steps (Dec 5)

Data collection & analysis

  • Get missing wikis:
    • Salt will use wikiList.3.csv to find which wikis we don't have.
    • Salt will email Mako the list, and he will ask the Spaniards whether they have them
    • Collect any others we need (Salt, Mako)
  • By Dec 6: Gather data for all wikis that made the switch within the first wave of migrations to msg walls (including re-running wikiq, new wikilist, and so on.) (Nate)
    • Partition Danny Horn csv into train and test sets (Nate) [done]
  • Write down preliminary inclusion criteria for analysis (Sneha, Nate) [done]
  • By Dec 12: Implement current inclusion criteria for analysis on training set [done]
    • Update existing READMEs, and write codebook describing all variables in detail (Sneha) [done]
  • By Jan 25 - Debug editweeks code/identify source of pre-cutoff edits (Nate)
  • By Jan 25 - Start writing analysis code
  • By Feb 15 - Get initial results, make headway on written draft


  • Early April - run analysis on test set

Writing

  • By Dec 15 - Convert draft of framing, literature review and methodology sections to ACM format (Sneha)
  • Week of Jan 8 - Check-in meeting to discuss preliminary results (Sneha)
  • By Jan 15 - Draft results section of the paper, identify changes to be made to framing/intro based on results (Sneha)
  • By Jan 31 - Complete first draft of paper (Sneha)
  • April 16 - CSCW abstract + metadata deadline
  • April 19 - CSCW submission deadline

Next Steps (Nov 30)

  • (Nate) Update code diagram to say we're using wikiList.3.csv
  • (Nate) Write python code to create training set
  • (Nate and Salt) Link missing dumps to online dump sources
  • (Sneha) Write R code for analyzing data at the edit weeks level


Next Steps (Oct 5)

  • (Salt) Verify which dumps are good
  • (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output
  • (Sneha) Run inclusion criteria code
  • (Sneha) Update code diagram

Next Steps (Sept 28)

  • (Salt) Verify which dumps are good
  • (Salt) Rerun wikiq to get encoded urls DONE
  • (Salt) Write R code to define edits as either newcomer or non-newcomer
  • (Nate) Finish refactoring build wiki list code (DONE)
  • (Nate) Make code architecture diagram (DONE)
  • (Nate) Continue work on bot and admin scraper (DONE)
  • (Nate, from last week) Convert dates to lubridate (DONE)
  • (Sneha) Contact Danny Horn to get information on message wall rollouts
  • (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want

Next Steps (Sept 20)

  • (Nate) Convert all dates to lubridate

Next Steps (Aug 24)

  • (Sneha) Add list of variables to build to the wiki
  • (Salt) Verify whether the wiki dumps are solid
  • (Nate) Generate 'contributor experience' variables for every edit in the dataset
  • (Nate) Generate bot and admin data for the larger wiki dataset

Next Steps (Aug 15)

  • (Salt) Edit build wikilist code to map filenames with message wall transition dates
  • (Sneha) Continue preliminary analysis with 25 wikis
  • (Nate) Continue investigating what dumps we can get from wikiteam

Next Steps (Aug 1)

  • (Salt) Make file with mapping between urls and the newly scraped dumps.
  • (Nate with Mako's help) Figure out what's going on in the wiki mapping code
  • (Sneha) Plan for visit in September
  • (Sneha) Continue preliminary analysis with 25 wikis

Retreat Tasks

  • Document and organize the git repository.
  • Data exploration / preliminary analysis.

Next Steps (July 18)

  • (Salt with Nate's help) Add new dumps to wikilist
  • (Nate) Update wikiteam mapping.
  • (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
  • (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
  • (Sneha) create exploratory plots for a larger set of wikis of different sizes.
  • (Sneha) Request new dumps for missing wikis.

Next Steps (July 11)

  • (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
  • (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
  • (Sneha) create exploratory plots for a larger set of wikis of different sizes.
  • (Sneha) Request new dumps for missing wikis.
  • (Salt) Download wikis available on Special:statistics. Done
  • (Nate) Scrape admin and bot edits using a script from Mako. Done

Next Steps (June 27th)

  • (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
  • (Sneha) Take a look namespaces 1200-1202 to understand what they mean. Done
  • (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
  • (Sneha) create exploratory plots for a larger set of wikis of different sizes.
  • (Salt) Download wikis available on Special:statistics.
  • (Salt) Request new dumps for missing wikis.
  • (Nate) Scrape admin and bot edits using a script from Mako.
  • (Nate) Finish identifying wikiteam mapping Done.

Next Steps (June 20th)

  • (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
  • (Nate) Get muppet wiki Dr. Horrible Wiki edit weeks for Sneha (Done)
  • (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
  • (Sneha) Will play with Dr. Horrible data (Done)
  • (Sneha) create list of subsetting characteristics for study

Next Steps (June 13th)

  • Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
  • Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
  • Do analysis of alt history wiki and update (Sneha)
  • Create list of criteria to identify wikis we want to use in this study (Sneha)

Next Steps (June 6th)

  • Identify list of Wikis we will analyze from the tsv file.
    • This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>.
  • Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.
  • Sneha to take a look althistory data from Nate.
  • Nate will write a version of build_edit_weeks for the message wall project
  • Check back next meeting Tuesday (June 13th)