CommunityData:ORES

What ORES is and does
ORES is a tool that uses machine learning to make predictions about the quality of Wikipedia edits and articles. It's useful for people making tools to improve maintenance processes on Wikipedia, but is also useful for doing research about the community -- the dynamics of how work gets done, and who does it.

The main site is here: https://ores.wikimedia.org/

Google Scholar will give you some links about how ORES is used for research if you use search term 'ORES', this one is great: https://dl.acm.org/citation.cfm?id=3125475

Using ORES is not too hard -- see documentation here: https://www.mediawiki.org/wiki/ORES -- scroll to the bottom of the docs page for usage examples. You can see an ORES score just by hitting a custom URL, no coding required.

ORES is available across multiple languages, you can see the range of linguistic support here: https://tools.wmflabs.org/ores-support-checklist/

Interpretive note: the first column uses abbreviations by language -- enwiki is English Wikipedia, dewiki is German Wikipedia. ORES can analyze both revisions and articles overall. For revisions, "Basic" support means you can find out if ORES predicts that an edit will ultimately be reverted. When "Advanced" support is available, "Basic" is gone: but instead you can see if an edit is predicted to be made in "good faith" according to Wikipedia's definition, and whether it is "damaging". For the article measures wp10 and draftquality, ORES considers the page overall. The wp10 model uses the "WP 1.0" quality levels schema (Stub, Start, C, B, GA, FA). GA is Good Article, and FA is Featured Article.

If you are less interested in individual revisions but rather in articles, you might be interested in the dataset release here: https://figshare.com/articles/Monthly_Wikipedia_article_quality_predictions/3859800 -- this gives a monthly view into quality per article.

Using ORES
I found it challenging to install ORES without root access because of the reliance on C libraries and system dictionaries, but it was trivial to do on a machine where I had root, so I installed it on Nada with pip.

On Nada, I use some scripts I wrote, which use the locally-installed ORES engine to make queries against the ORES environment run by the analytics team (i.e. you are not just hitting nada when you run code on nada -- it's making calls to the foundation's servers). A more detailed code walkthrough of my damaging-edits script is below, but the basic situation is: * I have a list of revision IDs in a tab-delimited format, I want a prediction of whether each revision is damaging. * ORES is expecting a command line invocation, but I have a dataset * ORES manages its own connection niceties, but I have to let it do so