Editing CommunityData:Wikia data
From CommunityData
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
== XML Dumps == | == XML Dumps == | ||
So far our Wikia projects all use the XML dumps. These are stored on hyak at / | So far our Wikia projects all use the XML dumps. These are stored on hyak at /com/raw_data/wikia_dumps. | ||
The 2010 wikia dumps are the only ones that approach being complete or reliable. | |||
The 2010 wikia dumps | |||
The next most useful dumps are from WikiTeam and were obtained from archive.org. | The next most useful dumps are from WikiTeam and were obtained from archive.org. | ||
As of 5-23-2017, the most recent complete dumps [https://archive.org/download/wikia_dump_20141219] from Wikiteam | As of 5-23-2017, the most recent complete dumps [https://archive.org/download/wikia_dump_20141219] from Wikiteam are from December 2014. Mako found some missing data in these dumps and contacted them. They released a patch [https://archive.org/details/wikiteam-incomplete-2014-12], which we have yet to validate. | ||
== Knowing if a dump is valid == | == Knowing if a dump is valid == | ||
The most common problem with dumps is to be truncated. Sometimes some tag does not close and xpat based parsers like python-mediawiki-utils / wikiq will fail. Commonly dumps are truncated after a revision or page for some unknown reason. | The most common problem with dumps is to be truncated. Sometimes some tag does not close and xpat based parsers like python-mediawiki-utils / wikiq will fail. Commonly dumps are truncated after a revision or page for some unknown reason. | ||
We assume that an xml dump is an accurate representation of the wiki if it has opening and closing <mediawiki> tags and is valid xml. | We assume that an xml dump is an accurate representation of the wiki if it has opening and closing <mediawiki> tags and is otherwise valid xml. | ||
Sometimes wikia dumps have funny quirks ( | Sometimes wikia dumps have funny quirks (they put SHA1s in weird places). We just have to work around these. Consider including and surfacing such fields when building tools for working with mediawiki data so that language objects accurately reflect underlying data. | ||
== Obtaining fresh dumps == | == Obtaining fresh dumps == | ||
If wikiteam data doesn't suit your needs you probably need to get a dump yourself. | If wikiteam data doesn't suit your needs you probably need to get a dump yourself. | ||
The first thing to try is to download it straight from Wikia on the special:statistics page. Note that you need to be logged in to do this. | The first thing to try is to download it straight from Wikia on the special:statistics page. Note that you need to be logged in to do this. | ||
1. Visit the special:statistics page of the wiki you want to download. e.g. http://althistory.wikia.com/wiki/Special:Statistics | |||
2. Click the link ( it's the timestamp) for "current pages and history: | |||
[[File:Wikiadumps.png]] | [[File:Wikiadumps.png]] | ||
If this is out of date or doesn't exist then you will have to request a new dump | If this is out of date or doesn't exist then you will have to request a new dump. | ||
Sometimes there is an error. In this case you have to get a new dump from the api[https://github.com/WikiTeam/wikiteam/blob/master/dumpgenerator.py]. |