CommunityData:Message Walls/Archived tasks: Difference between revisions
From CommunityData
(Created page with "Foobar") |
No edit summary |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
==Next Steps (Dec 5)== | |||
===Data collection & analysis=== | |||
*Get missing wikis: | |||
** Salt will use wikiList.3.csv to find which wikis we don't have. | |||
** Salt will email Mako the list, and he will ask the Spaniards whether they have them | |||
** Collect any others we need (Salt, Mako) | |||
*By Dec 6: Gather data for all wikis that made the switch within the first wave of migrations to msg walls (including re-running wikiq, new wikilist, and so on.) (Nate) | |||
** Partition Danny Horn csv into train and test sets (Nate) ['''done'''] | |||
*Write down preliminary inclusion criteria for analysis (Sneha, Nate) ['''done'''] | |||
*By Dec 12: Implement current inclusion criteria for analysis on training set ['''done'''] | |||
** Update existing READMEs, and write codebook describing all variables in detail (Sneha) ['''done'''] | |||
* By Jan 25 - Debug editweeks code/identify source of pre-cutoff edits (Nate) | |||
* By Jan 25 - Start writing analysis code | |||
* By Feb 15 - Get initial results, make headway on written draft | |||
* Early April - run analysis on test set | |||
===Writing=== | |||
*By Dec 15 - Convert draft of framing, literature review and methodology sections to ACM format (Sneha) | |||
*Week of Jan 8 - Check-in meeting to discuss preliminary results (Sneha) | |||
*By Jan 15 - Draft results section of the paper, identify changes to be made to framing/intro based on results (Sneha) | |||
*By Jan 31 - Complete first draft of paper (Sneha) | |||
* April 16 - CSCW abstract + metadata deadline | |||
* April 19 - CSCW submission deadline | |||
==Next Steps (Nov 30)== | |||
*(Nate) Update code diagram to say we're using wikiList.3.csv | |||
*(Nate) Write python code to create training set | |||
*(Nate and Salt) Link missing dumps to online dump sources | |||
*(Sneha) Write R code for analyzing data at the edit weeks level | |||
==Next Steps (Oct 5)== | |||
* (Salt) Verify which dumps are good | |||
* (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output | |||
* (Sneha) Run inclusion criteria code | |||
* (Sneha) Update code diagram | |||
==Next Steps (Sept 28)== | |||
* (Salt) Verify which dumps are good | |||
* (Salt) Rerun wikiq to get encoded urls DONE | |||
* (Salt) Write R code to define edits as either newcomer or non-newcomer | |||
* (Nate) Finish refactoring build wiki list code (DONE) | |||
* (Nate) Make code architecture diagram (DONE) | |||
* (Nate) Continue work on bot and admin scraper (DONE) | |||
* (Nate, from last week) Convert dates to lubridate (DONE) | |||
* (Sneha) Contact Danny Horn to get information on message wall rollouts | |||
* (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want | |||
==Next Steps (Sept 20)== | |||
* (Nate) Convert all dates to lubridate | |||
==Next Steps (Aug 24)== | |||
*(Sneha) Add list of variables to build to the wiki | |||
*(Salt) Verify whether the wiki dumps are solid | |||
*(Nate) Generate 'contributor experience' variables for every edit in the dataset | |||
*(Nate) Generate bot and admin data for the larger wiki dataset | |||
==Next Steps (Aug 15)== | |||
*(Salt) Edit build wikilist code to map filenames with message wall transition dates | |||
*(Sneha) Continue preliminary analysis with 25 wikis | |||
*(Nate) Continue investigating what dumps we can get from wikiteam | |||
==Next Steps (Aug 1)== | |||
*(Salt) Make file with mapping between urls and the newly scraped dumps. | |||
*(Nate with Mako's help) Figure out what's going on in the wiki mapping code | |||
*(Sneha) Plan for visit in September | |||
*(Sneha) Continue preliminary analysis with 25 wikis | |||
== Retreat Tasks == | |||
* Document and organize the git repository. | |||
* Data exploration / preliminary analysis. | |||
== Next Steps (July 18) == | |||
* (Salt with Nate's help) Add new dumps to wikilist | |||
* (Nate) Update wikiteam mapping. | |||
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data. | |||
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study. | |||
* (Sneha) create exploratory plots for a larger set of wikis of different sizes. | |||
* (Sneha) Request new dumps for missing wikis. | |||
== Next Steps (July 11) == | |||
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data. | |||
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study. | |||
* (Sneha) create exploratory plots for a larger set of wikis of different sizes. | |||
* (Sneha) Request new dumps for missing wikis. | |||
* (Salt) Download wikis available on Special:statistics. '''Done''' | |||
* (Nate) Scrape admin and bot edits using a script from Mako. '''Done''' | |||
== Next Steps (June 27th) == | |||
* (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data. | |||
* (Sneha) Take a look namespaces 1200-1202 to understand what they mean. '''Done''' | |||
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study. | |||
* (Sneha) create exploratory plots for a larger set of wikis of different sizes. | |||
* (Salt) Download wikis available on Special:statistics. | |||
* (Salt) Request new dumps for missing wikis. | |||
* (Nate) Scrape admin and bot edits using a script from Mako. | |||
* (Nate) Finish identifying wikiteam mapping '''Done'''. | |||
==Next Steps (June 20th)== | |||
* (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done) | |||
* (Nate) Get <strike>muppet wiki</strike> Dr. Horrible Wiki edit weeks for Sneha (Done) | |||
* (Nate) Do brute force mapping using revision ids and and hashing texts (Done) | |||
* (Sneha) Will play with Dr. Horrible data (Done) | |||
* (Sneha) create list of subsetting characteristics for study | |||
==Next Steps (June 13th)== | |||
* Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps) | |||
* Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate) | |||
* Do analysis of alt history wiki and update (Sneha) | |||
* Create list of criteria to identify wikis we want to use in this study (Sneha) | |||
== Next Steps (June 6th)== | |||
* Identify list of Wikis we will analyze from the tsv file. | |||
* Attempt to obtain a good dump for each of these wikis. See [[CommunityData:Wikia Dumps]] for information. | |||
** This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>. | |||
* Modify Wikiq to give an error message if the closing </mediawiki> tag is missing. | |||
* Sneha to take a look althistory data from Nate. | |||
* Nate will write a version of build_edit_weeks for the message wall project | |||
* Check back next meeting Tuesday (June 13th) |
Latest revision as of 17:28, 15 March 2018
Next Steps (Dec 5)[edit]
Data collection & analysis[edit]
- Get missing wikis:
- Salt will use wikiList.3.csv to find which wikis we don't have.
- Salt will email Mako the list, and he will ask the Spaniards whether they have them
- Collect any others we need (Salt, Mako)
- By Dec 6: Gather data for all wikis that made the switch within the first wave of migrations to msg walls (including re-running wikiq, new wikilist, and so on.) (Nate)
- Partition Danny Horn csv into train and test sets (Nate) [done]
- Write down preliminary inclusion criteria for analysis (Sneha, Nate) [done]
- By Dec 12: Implement current inclusion criteria for analysis on training set [done]
- Update existing READMEs, and write codebook describing all variables in detail (Sneha) [done]
- By Jan 25 - Debug editweeks code/identify source of pre-cutoff edits (Nate)
- By Jan 25 - Start writing analysis code
- By Feb 15 - Get initial results, make headway on written draft
- Early April - run analysis on test set
Writing[edit]
- By Dec 15 - Convert draft of framing, literature review and methodology sections to ACM format (Sneha)
- Week of Jan 8 - Check-in meeting to discuss preliminary results (Sneha)
- By Jan 15 - Draft results section of the paper, identify changes to be made to framing/intro based on results (Sneha)
- By Jan 31 - Complete first draft of paper (Sneha)
- April 16 - CSCW abstract + metadata deadline
- April 19 - CSCW submission deadline
Next Steps (Nov 30)[edit]
- (Nate) Update code diagram to say we're using wikiList.3.csv
- (Nate) Write python code to create training set
- (Nate and Salt) Link missing dumps to online dump sources
- (Sneha) Write R code for analyzing data at the edit weeks level
Next Steps (Oct 5)[edit]
- (Salt) Verify which dumps are good
- (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output
- (Sneha) Run inclusion criteria code
- (Sneha) Update code diagram
Next Steps (Sept 28)[edit]
- (Salt) Verify which dumps are good
- (Salt) Rerun wikiq to get encoded urls DONE
- (Salt) Write R code to define edits as either newcomer or non-newcomer
- (Nate) Finish refactoring build wiki list code (DONE)
- (Nate) Make code architecture diagram (DONE)
- (Nate) Continue work on bot and admin scraper (DONE)
- (Nate, from last week) Convert dates to lubridate (DONE)
- (Sneha) Contact Danny Horn to get information on message wall rollouts
- (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want
Next Steps (Sept 20)[edit]
- (Nate) Convert all dates to lubridate
Next Steps (Aug 24)[edit]
- (Sneha) Add list of variables to build to the wiki
- (Salt) Verify whether the wiki dumps are solid
- (Nate) Generate 'contributor experience' variables for every edit in the dataset
- (Nate) Generate bot and admin data for the larger wiki dataset
Next Steps (Aug 15)[edit]
- (Salt) Edit build wikilist code to map filenames with message wall transition dates
- (Sneha) Continue preliminary analysis with 25 wikis
- (Nate) Continue investigating what dumps we can get from wikiteam
Next Steps (Aug 1)[edit]
- (Salt) Make file with mapping between urls and the newly scraped dumps.
- (Nate with Mako's help) Figure out what's going on in the wiki mapping code
- (Sneha) Plan for visit in September
- (Sneha) Continue preliminary analysis with 25 wikis
Retreat Tasks[edit]
- Document and organize the git repository.
- Data exploration / preliminary analysis.
Next Steps (July 18)[edit]
- (Salt with Nate's help) Add new dumps to wikilist
- (Nate) Update wikiteam mapping.
- (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
- (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
- (Sneha) create exploratory plots for a larger set of wikis of different sizes.
- (Sneha) Request new dumps for missing wikis.
Next Steps (July 11)[edit]
- (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
- (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
- (Sneha) create exploratory plots for a larger set of wikis of different sizes.
- (Sneha) Request new dumps for missing wikis.
- (Salt) Download wikis available on Special:statistics. Done
- (Nate) Scrape admin and bot edits using a script from Mako. Done
Next Steps (June 27th)[edit]
- (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
- (Sneha) Take a look namespaces 1200-1202 to understand what they mean. Done
- (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
- (Sneha) create exploratory plots for a larger set of wikis of different sizes.
- (Salt) Download wikis available on Special:statistics.
- (Salt) Request new dumps for missing wikis.
- (Nate) Scrape admin and bot edits using a script from Mako.
- (Nate) Finish identifying wikiteam mapping Done.
Next Steps (June 20th)[edit]
- (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
- (Nate) Get
muppet wikiDr. Horrible Wiki edit weeks for Sneha (Done) - (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
- (Sneha) Will play with Dr. Horrible data (Done)
- (Sneha) create list of subsetting characteristics for study
Next Steps (June 13th)[edit]
- Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
- Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
- Do analysis of alt history wiki and update (Sneha)
- Create list of criteria to identify wikis we want to use in this study (Sneha)
Next Steps (June 6th)[edit]
- Identify list of Wikis we will analyze from the tsv file.
- Attempt to obtain a good dump for each of these wikis. See CommunityData:Wikia Dumps for information.
- This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>.
- Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.
- Sneha to take a look althistory data from Nate.
- Nate will write a version of build_edit_weeks for the message wall project
- Check back next meeting Tuesday (June 13th)