CommunityData:Message Walls: Difference between revisions

From CommunityData
No edit summary
 
(65 intermediate revisions by 3 users not shown)
Line 3: Line 3:
* Notes on Wikia Dumps [[CommunityData:Wikia Dumps]]
* Notes on Wikia Dumps [[CommunityData:Wikia Dumps]]
* Notes on the code -- Now with a diagram! [[CommunityData:Message Walls Code]]
* Notes on the code -- Now with a diagram! [[CommunityData:Message Walls Code]]
= Robustness Checks =
* Pre-period matching placebo test
* Normal placebo test


= Task Management =  
= Task Management =  


==Overview==
==Overview ==
 
*By first week of October, we have a complete dataset
*By end of October, we build all variables, and start analysis
*By end of November, we conduct and finish data analysis
*Sneha gets a lot of writing done during dissertation boot camp (Dec 4-15)
*By January, have a solid draft ready for CSCW second round
 
==Next Steps (Nov 30)==
*(Nate) Update code diagram to say we're using wikiList.3.csv
*(
 
 
==Next Steps (Oct 5)==
 
* (Salt) Verify which dumps are good
* (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output
* (Sneha) Run inclusion criteria code
* (Sneha) Update code diagram
 
==Next Steps (Sept 28)==
* (Salt) Verify which dumps are good
* (Salt) Rerun wikiq to get encoded urls DONE
* (Salt) Write R code to define edits as either newcomer or non-newcomer
* (Nate) Finish refactoring build wiki list code (DONE)
* (Nate) Make code architecture diagram (DONE)
* (Nate) Continue work on bot and admin scraper (DONE)
* (Nate, from last week) Convert dates to lubridate (DONE)
* (Sneha) Contact Danny Horn to get information on message wall rollouts
* (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want
 
==Next Steps (Sept 20)==
* (Nate) Convert all dates to lubridate
 
==Next Steps (Aug 24)==
*(Sneha) Add list of variables to build to the wiki
*(Salt) Verify whether the wiki dumps are solid
*(Nate) Generate 'contributor experience' variables for every edit in the dataset
*(Nate) Generate bot and admin data for the larger wiki dataset
 
==Next Steps (Aug 15)==
*(Salt) Edit build wikilist code to map filenames with message wall transition dates
*(Sneha) Continue preliminary analysis with 25 wikis
*(Nate) Continue investigating what dumps we can get from wikiteam
 
==Next Steps (Aug 1)==
*(Salt) Make file with mapping between urls and the newly scraped dumps.
*(Nate with Mako's help) Figure out what's going on in the wiki mapping code
*(Sneha) Plan for visit in September
*(Sneha) Continue preliminary analysis with 25 wikis
 
== Retreat Tasks ==
* Document and organize the git repository.
* Data exploration / preliminary analysis.
 
== Next Steps (July 18) ==
* (Salt with Nate's help) Add new dumps to wikilist
* (Nate) Update wikiteam mapping.
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
* (Sneha) create exploratory plots for a larger set of wikis of different sizes.
* (Sneha) Request new dumps for missing wikis.
 
== Next Steps (July 11) ==
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
* (Sneha) create exploratory plots for a larger set of wikis of different sizes.
* (Sneha) Request new dumps for missing wikis.
* (Salt) Download wikis available on Special:statistics. '''Done'''
* (Nate) Scrape admin and bot edits using a script from Mako. '''Done'''
 
== Next Steps (June 27th) ==
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
* (Sneha) Take a look namespaces 1200-1202 to understand what they mean. '''Done'''
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
* (Sneha) create exploratory plots for a larger set of wikis of different sizes.
* (Salt) Download wikis available on Special:statistics.
* (Salt) Request new dumps for missing wikis.
* (Nate) Scrape admin and bot edits using a script from Mako.
* (Nate) Finish identifying wikiteam mapping '''Done'''.
 
==Next Steps (June 20th)==
* (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
* (Nate) Get <strike>muppet wiki</strike> Dr. Horrible Wiki edit weeks for Sneha (Done)
* (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
* (Sneha) Will play with Dr. Horrible data (Done)
* (Sneha) create list of subsetting characteristics for study
 
==Next Steps (June 13th)==
 
* Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
* Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
 
* Do analysis of alt history wiki and update (Sneha)
 
* Create list of criteria to identify wikis we want to use in this study (Sneha)
 
== Next Steps (June 6th)==  


* Identify list of Wikis we will analyze from the tsv file. 
(Updated March 15th)


* Attempt to obtain a good dump for each of these wikis. See [[CommunityData:Wikia Dumps]] for information.
===Get missing wikis===
*'''ASAP''' Need to use wikilist3.csv to determine which wikis we don't have - Salt (with Mako's help)
*'''ASAP''' Download the rest and put them through wikiq and build edit weeks - Salt (with Mako's help)


** This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>. 
===Analysis===
* Another meeting with full team to go over the results and try to make sense of them (after Sneha takes a first stab)
* Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.
* Determine any other models we want to run


* Sneha to take a look althistory data from Nate
===Writing===
* Switch from Haythornwaite to Reader to Leader framing (Sneha)
* knitr integration (Sneha + Nate)
* plots (Salt)
* Better pictures of message walls (Sneha)
* Better explanations of why talk pages suck (Sneha)
* Zotero streamlining


* Nate will write a version of build_edit_weeks for the message wall project
== Archive ==


* Check back next meeting Tuesday (June 13th)
* [[/Archived_tasks|Past next steps]]

Latest revision as of 20:31, 14 June 2018

Useful Resources[edit]

Robustness Checks[edit]

  • Pre-period matching placebo test
  • Normal placebo test

Task Management[edit]

Overview[edit]

(Updated March 15th)

Get missing wikis[edit]

  • ASAP Need to use wikilist3.csv to determine which wikis we don't have - Salt (with Mako's help)
  • ASAP Download the rest and put them through wikiq and build edit weeks - Salt (with Mako's help)

Analysis[edit]

  • Another meeting with full team to go over the results and try to make sense of them (after Sneha takes a first stab)
  • Determine any other models we want to run

Writing[edit]

  • Switch from Haythornwaite to Reader to Leader framing (Sneha)
  • knitr integration (Sneha + Nate)
  • plots (Salt)
  • Better pictures of message walls (Sneha)
  • Better explanations of why talk pages suck (Sneha)
  • Zotero streamlining

Archive[edit]