Not logged in
Talk
Contributions
Create account
Log in
Navigation
Main page
About
People
Publications
Teaching
Resources
Research Blog
Wiki Functions
Recent changes
Help
Licensing
Project page
Discussion
Edit
View history
Editing
CommunityData:ORES
(section)
From CommunityData
Jump to:
navigation
,
search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===What ORES is and does=== ORES is a tool that uses machine learning to make predictions about the quality of Wikipedia edits and articles. It's useful for people making tools to improve maintenance processes on Wikipedia, but is also useful for doing research about the community -- the dynamics of how work gets done, and who does it. The main site is here: https://ores.wikimedia.org/ Google Scholar will give you some links about how ORES is used for research if you use search term 'ORES', this one is great: https://dl.acm.org/citation.cfm?id=3125475 Using ORES is not too hard -- see documentation here: https://www.mediawiki.org/wiki/ORES -- scroll to the bottom of the docs page for usage examples. You can see an ORES score just by hitting a custom URL, no coding required. ORES is available across multiple languages, you can see the range of linguistic support here: https://tools.wmflabs.org/ores-support-checklist/ Interpretive note: the first column uses abbreviations by language -- enwiki is English Wikipedia, dewiki is German Wikipedia. ORES can analyze both revisions and articles overall. For revisions, "Basic" support means you can find out if ORES predicts that an edit will ultimately be reverted. When "Advanced" support is available, "Basic" is gone: but instead you can see if an edit is predicted to be made in "good faith" according to Wikipedia's definition, and whether it is "damaging". For the article measures wp10 and draftquality, ORES considers the page overall. The wp10 model uses the "WP 1.0" quality levels schema (Stub, Start, C, B, GA, FA). GA is Good Article, and FA is Featured Article. If you are less interested in individual revisions but rather in articles, you might be interested in the dataset release here: https://figshare.com/articles/Monthly_Wikipedia_article_quality_predictions/3859800 -- this gives a monthly view into quality per article. ==== Using ORES ==== I found it challenging to install ORES without root access because of the reliance on C libraries and system dictionaries, but it was trivial to do on a machine where I had root, so I installed it on Nada with pip. On Nada, I use some scripts I wrote, which use the locally-installed ORES engine to make queries against the ORES environment run by the analytics team (i.e. you are not just hitting nada when you run code on nada -- it's making calls to the foundation's servers). A more detailed code walkthrough of my damaging-edits script is below, but the basic situation is: * I have a list of revision IDs in a tab-delimited format, I want a prediction of whether each revision is damaging. * ORES is expecting a command line invocation, but I have a dataset * ORES manages its own connection niceties, but I have to let it do so ==== Code Example ==== <syntaxhighlight lang="python"> #!/usr/bin/env python3 ########################################################################## ## This script runs the ORES scorer against revision ids by assembling many examples of the following shell command: ## echo -e '{"rev_id": 456789}\n{"rev_id": 3242342}\n{"rev_id": 618882377}' | ores score_revisions https://ores.wikimedia.org enwiki wp10 > thatfile.txt ## ## Inspired by the documentation located here: https://www.mediawiki.org/wiki/ORES ## ## Assumptions: ## ## This script assumes a tab-delimited file with a header, and that one element of that header is 'revid' -- a valid wikipedia revision id ## ########################################################################## ## Warnings: ## ## You will need to edit the command line to reflect the wiki whose you want to score -- enwiki, frwiki, etc. See comment marked (A). ## ## The code is designed to allow ORES to load-balance your queries on your behalf. A group of 100 revids will likely result in two ## parallel threads of 50 revids each, which is the current recommended load. Don't change the way you throttle load without guidance ## from the development team. ## ########################################################################## ## Components: ## (0) Modal Configs and Process Args ## (1) Read in Revision IDs ## (2) Assemble shell command and run repeatedly on groups of IDs. ## ## (0) Modal Configs and Process Args #DEBUG=1 DEBUG=0 import argparse import os import csv theList = [] parser = argparse.ArgumentParser(description='Generates a kajillion shell commands and runs them.') parser.add_argument('-i', help="Infile containing revision IDs to look up.", required=True) args = parser.parse_args() ## (1) Read in Revision IDs givenInfile = args.i with open(givenInfile, 'r') as infileHandle: theInfile = csv.DictReader(infileHandle, delimiter="\t", quotechar='"') for currentLine in theInfile: theList.append(currentLine["revid"]) # makes a list of all the revids in the file chunkSize = 100 # see note (B); it's not recommended to change this chunk size without guidance for i in range(0, len(theList), chunkSize): # iterates over theList in 100-revid chunks chunk = theList[i:i+chunkSize] if DEBUG: # change the modal config to DEBUG=1 if you want to see these messages, leave it 0 if you don't print(chunk) uglyString = "" # ORES is expecting a JSON format; we fake it here in a string I call uglyString for revid in chunk: uglyString = uglyString + "{\"rev_id\": " + revid uglyString = uglyString + "}\\n" if DEBUG: print(uglyString[-2]) if uglyString[-2] == "\\": #we don't need the trailing linebreak uglyString = uglyString[:-2] if DEBUG: print(uglyString) # see note (A); this is where you can change the language #theCommand = '''echo '%s' | ores score_revisions https://ores.wikimedia.org enwiki damaging >> predictDamaging.txt''' % uglyString #theCommand = '''echo '%s' | ores score_revisions https://ores.wikimedia.org ruwiki damaging >> predictDamaging.txt''' % uglyString theCommand = '''echo '%s' | ores score_revisions https://ores.wikimedia.org frwiki damaging >> predictDamaging.txt''' % uglyString if DEBUG: print(theCommand) ## (2) Assemble shell command and run repeatedly on groups of IDs. os.system(theCommand) </syntaxhighlight>
Summary:
Please note that all contributions to CommunityData are considered to be released under the Attribution-Share Alike 3.0 Unported (see
CommunityData:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following CAPTCHA:
Cancel
Editing help
(opens in new window)
Tools
What links here
Related changes
Special pages
Page information