Tor and Wikipedia: Difference between revisions

Revision as of 16:50, 29 March 2018

When can a Tor exit node IP edit Wikipedia?

Theory — It takes some time for an exit node to be added to the list of blocked IPs. As a result, Tor users that randomly happen to be routed out of a recently added exit nodes (in the period before Wikipedia has blocked the exit nodes IP) are sometimes allowed to edit.
Theory — Some exit nodes don't get added to Wikipedia's block list automatically through TorBlock. Tor users who are routed through these exit nodes are allowed to edit Wikipedia until an administrator or bot notices and blocks the IP address.
Theory — Some forms of blocking expire after a certain amount of time and, if a Tor node is blocked with an expiry time, then traffic may be allowed through until it is blocked again. This could account for on-and-off patterns of editing coming from Tor nodes.

Identifying IPs that are/were Tor exit nodes

We're pulling data from CollectTor which is a service created by Tor that "fetches data from various nodes and services in the public Tor network and makes it available to the world."

How is Tor blocked on Wikipedia

Extension:TorBlock

There is a MediaWiki plugin called Extension:TorBlock. Source code is in a Phabricator repository.

What we can tell about how it works:

during a period from XXXX to XXXX it read from the Tor Project bulk list service (https://check.torproject.org/cgi-bin/TorBulkExitList.py?ip=)
after XXXX, it pulls from the newer "Onionoo" service (https://onionoo.torproject.org/details?type=relay&running=true&flag=Exit)
pulls perioditically, typically from a cronjob

Blocked by Administrators "by hand"

Blocks are recorded in Special:Log (e.g., ENWP).

Blocked by a Bot

As above, blocks are recorded in Special:Log (e.g., ENWP).

There are at least some bots that seem to automatically find and block open proxies:

ProcseeBot (apparently closed source but we could contact the author/operator Slakr)

Structure of Special:Log XML file

Included in the XML file with notes that might include the template pattern "{{Tor|?.*}}"

Open questions

There are notes in Special:Log that suggest that some IP addresses are "confirmed Tor Nodes" (e.g., ???) that are blocked by hand. Why were these not caught by TorBlock?
- Are these exit nodes present in our CollectTor data as well?
Why does the distribution of edits edits over the time periods that IPs are marked as exit nodes in our dataset of Tor exit node "spells" not bunch up near the beginning of the period when the IP is a new Tor node and the IP seems less likely to be blocked? Why are there Tor exit nodes that seem to have been listed in CollectTor for long periods of time without being blocked by Wikipedia?
If TorBlock identifies a Tor exit node, are these IP addresses added to or reflected in the Special:Log block log?
Does an IP address need to make an edit first in order to be blocked as an open proxy? Can we find examples of this happening? If so, is it always bots that doing the blocking?
What bots are involved in detecting and blocking IP addresses that are open proxies (especially Tor).

@@ Line 3: / Line 3: @@
 * '''Theory''' — It takes some time for an exit node to be added to the list of blocked IPs. As a result, Tor users that randomly happen to be routed out of a recently added exit nodes (in the period before Wikipedia has blocked the exit nodes IP) are sometimes allowed to edit.
 * '''Theory''' — Some exit nodes don't get added to Wikipedia's block list automatically through TorBlock. Tor users who are routed through these exit nodes are allowed to edit Wikipedia until an administrator or bot notices and blocks the IP address.
+* '''Theory''' — Some forms of blocking expire after a certain amount of time and, if a Tor node is blocked with an expiry time, then traffic may be allowed through until it is blocked again. This could account for on-and-off patterns of editing coming from Tor nodes.
 == Identifying IPs that are/were Tor exit nodes ==