CommunityData:Message Walls: Difference between revisions
From CommunityData
Groceryheist (talk | contribs) (→Next Steps (June 20th): finished mapping between wikia wikis and wikiteam dumps) |
Groceryheist (talk | contribs) |
||
Line 9: | Line 9: | ||
* Scrape admin and bot edits using a script from Mako | * Scrape admin and bot edits using a script from Mako | ||
* Check that dumps, even if valid xml, have message wall data. | * Check that dumps, even if valid xml, have message wall data. | ||
== Next Steps (June 27th) == | |||
* Take a look namespaces 1200-1202 to understand what they mean. | |||
* (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study. | |||
* Download wikis available on Special:statistics. | |||
* Request new dumps for missing wikis. | |||
==Next Steps (June 20th)== | ==Next Steps (June 20th)== | ||
Line 14: | Line 22: | ||
* (Nate) Get <strike>muppet wiki</strike> Dr. Horrible Wiki edit weeks for Sneha (Done) | * (Nate) Get <strike>muppet wiki</strike> Dr. Horrible Wiki edit weeks for Sneha (Done) | ||
* (Nate) Do brute force mapping using revision ids and and hashing texts (Done) | * (Nate) Do brute force mapping using revision ids and and hashing texts (Done) | ||
* (Sneha) Will play with | * (Sneha) Will play with Dr. Horrible data (Done) | ||
* (Sneha) create list of subsetting characteristics for study | * (Sneha) create list of subsetting characteristics for study | ||
==Next Steps (June 13th)== | ==Next Steps (June 13th)== |
Revision as of 20:55, 27 June 2017
Useful Resources
- Notes on Wikia Dumps CommunityData:Wikia Dumps
- Notes on the code CommunityData:Message Walls Code
Task Management
Future Tasks
- Scrape admin and bot edits using a script from Mako
- Check that dumps, even if valid xml, have message wall data.
Next Steps (June 27th)
- Take a look namespaces 1200-1202 to understand what they mean.
- (Sneha) create list of subsetting characteristics (inclusion criteria for Wikis) for study.
- Download wikis available on Special:statistics.
- Request new dumps for missing wikis.
Next Steps (June 20th)
- (Nate) Improve wiki list by identifying wikis that turn off the feature without turning on first (Done)
- (Nate) Get
muppet wikiDr. Horrible Wiki edit weeks for Sneha (Done) - (Nate) Do brute force mapping using revision ids and and hashing texts (Done)
- (Sneha) Will play with Dr. Horrible data (Done)
- (Sneha) create list of subsetting characteristics for study
Next Steps (June 13th)
- Build a new dataset of dumps of the ~4800 wikis (Salt/Nate) (May take more than a week to generate all the new dumps)
- Build a msgwall version of the build_edit_weeks file from the anon_edits paper (Nate)
- Do analysis of alt history wiki and update (Sneha)
- Create list of criteria to identify wikis we want to use in this study (Sneha)
Next Steps (June 6th)
- Identify list of Wikis we will analyze from the tsv file.
- Attempt to obtain a good dump for each of these wikis. See CommunityData:Wikia Dumps for information.
- This may depend on mapping between the urls in the tsv file and the dumps. Consider using HTTP redirects from the url under <siteinfo>.
- Modify Wikiq to give an error message if the closing </mediawiki> tag is missing.
- Sneha to take a look althistory data from Nate.
- Nate will write a version of build_edit_weeks for the message wall project
- Check back next meeting Tuesday (June 13th)