|
|
Line 28: |
Line 28: |
| *(Nate and Salt) Link missing dumps to online dump sources | | *(Nate and Salt) Link missing dumps to online dump sources |
| *(Sneha) Write R code for analyzing data at the edit weeks level | | *(Sneha) Write R code for analyzing data at the edit weeks level |
|
| |
| ==Next Steps (Oct 5)==
| |
|
| |
| * (Salt) Verify which dumps are good
| |
| * (Nate, Salt) Make sure Sneha gets a path to the tsv files from wikiq output
| |
| * (Sneha) Run inclusion criteria code
| |
| * (Sneha) Update code diagram
| |
|
| |
| ==Next Steps (Sept 28)==
| |
| * (Salt) Verify which dumps are good
| |
| * (Salt) Rerun wikiq to get encoded urls DONE
| |
| * (Salt) Write R code to define edits as either newcomer or non-newcomer
| |
| * (Nate) Finish refactoring build wiki list code (DONE)
| |
| * (Nate) Make code architecture diagram (DONE)
| |
| * (Nate) Continue work on bot and admin scraper (DONE)
| |
| * (Nate, from last week) Convert dates to lubridate (DONE)
| |
| * (Sneha) Contact Danny Horn to get information on message wall rollouts
| |
| * (Sneha) Determine inclusion criteria for wikis, and write python code to subset the ones we want
| |
|
| |
| ==Next Steps (Sept 20)==
| |
| * (Nate) Convert all dates to lubridate
| |
|
| |
| ==Next Steps (Aug 24)==
| |
| *(Sneha) Add list of variables to build to the wiki
| |
| *(Salt) Verify whether the wiki dumps are solid
| |
| *(Nate) Generate 'contributor experience' variables for every edit in the dataset
| |
| *(Nate) Generate bot and admin data for the larger wiki dataset
| |
|
| |
| ==Next Steps (Aug 15)==
| |
| *(Salt) Edit build wikilist code to map filenames with message wall transition dates
| |
| *(Sneha) Continue preliminary analysis with 25 wikis
| |
| *(Nate) Continue investigating what dumps we can get from wikiteam
| |
|
| |
| ==Next Steps (Aug 1)==
| |
| *(Salt) Make file with mapping between urls and the newly scraped dumps.
| |
| *(Nate with Mako's help) Figure out what's going on in the wiki mapping code
| |
| *(Sneha) Plan for visit in September
| |
| *(Sneha) Continue preliminary analysis with 25 wikis
| |
|
| |
| == Retreat Tasks ==
| |
| * Document and organize the git repository.
| |
| * Data exploration / preliminary analysis.
| |
|
| |
| == Next Steps (July 18) ==
| |
| * (Salt with Nate's help) Add new dumps to wikilist
| |
| * (Nate) Update wikiteam mapping.
| |
| * (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
| |
| * (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
| |
| * (Sneha) create exploratory plots for a larger set of wikis of different sizes.
| |
| * (Sneha) Request new dumps for missing wikis.
| |
|
| |
| == Next Steps (July 11) ==
| |
| * (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
| |
| * (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
| |
| * (Sneha) create exploratory plots for a larger set of wikis of different sizes.
| |
| * (Sneha) Request new dumps for missing wikis.
| |
| * (Salt) Download wikis available on Special:statistics. '''Done'''
| |
| * (Nate) Scrape admin and bot edits using a script from Mako. '''Done'''
| |
|
| |
| == Next Steps (June 27th) ==
| |
| * (Sneha) (Using wikiq data) Check that dumps, even if valid xml, have message wall data.
| |
| * (Sneha) Take a look namespaces 1200-1202 to understand what they mean. '''Done'''
| |
| * (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
| |
| * (Sneha) create exploratory plots for a larger set of wikis of different sizes.
| |
| * (Salt) Download wikis available on Special:statistics.
| |
| * (Salt) Request new dumps for missing wikis.
| |
| * (Nate) Scrape admin and bot edits using a script from Mako.
| |
| * (Nate) Finish identifying wikiteam mapping '''Done'''.
| |
|
| |
| ==Next Steps (June 20th)==
| |
| * (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
| |
| * (Nate) Get <strike>muppet wiki</strike> Dr. Horrible Wiki edit weeks for Sneha (Done)
| |
| * (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
| |
| * (Sneha) Will play with Dr. Horrible data (Done)
| |
| * (Sneha) create list of subsetting characteristics for study
| |
|
| |
| ==Next Steps (June 13th)==
| |
|
| |
| * Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
| |
|
| |
| * Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
| |
|
| |
| * Do analysis of alt history wiki and update (Sneha)
| |
|
| |
| * Create list of criteria to identify wikis we want to use in this study (Sneha)
| |
|
| |
| == Next Steps (June 6th)==
| |
|
| |
| * Identify list of Wikis we will analyze from the tsv file.
| |
|
| |
| * Attempt to obtain a good dump for each of these wikis. See [[CommunityData:Wikia Dumps]] for information.
| |
|
| |
| ** This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>.
| |
|
| |
| * Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.
| |
|
| |
| * Sneha to take a look althistory data from Nate.
| |
|
| |
| * Nate will write a version of build_edit_weeks for the message wall project
| |
|
| |
| * Check back next meeting Tuesday (June 13th)
| |