Tor and Wikipedia: Difference between revisions

From CommunityData
Line 1: Line 1:
== Data Sources ==
== Identifying IPs that are/were Tor exit nodes ==


We're pulling data from [https://metrics.torproject.org/collector.html CollectTor] which is a service created by Tor that "fetches data from various nodes and services in the public Tor network and makes it available to the world."
We're pulling data from [https://metrics.torproject.org/collector.html CollectTor] which is a service created by Tor that "fetches data from various nodes and services in the public Tor network and makes it available to the world."

Revision as of 21:06, 20 March 2018

Identifying IPs that are/were Tor exit nodes

We're pulling data from CollectTor which is a service created by Tor that "fetches data from various nodes and services in the public Tor network and makes it available to the world."

How is Tor blocked on Wikipedia

Extension:TorBlock

There is a MediaWiki plugin called Extension:TorBlock. Source code is in a Phabricator repository.

What we can tell about how it works:

  • during a period from XXXX to XXXX it read from the Tor Project bulk list service (link?)
  • after XXXX, it pulls from the newer "Onionoo" service
  • pulls perioditically, typically from a cronjob

Blocked by Administrators MediaWiki

  • Blocks are recorded in Special:Log (e.g., ENWP).

Structure of XML file

  • Included in the XML file with notes that might include the template pattern "{{Tor|?.*}}"

Open questions

  • There are notes in the log that suggest that some IP addresses are "confirmed Tor Nodes" (e.g., ???) that are blocked through MediaWiki. Why are these not caught? Are they are in CollectTor?
  • Why does the distribution of edits edits over the time periods that IPs are marked as exit nodes in our dataset of Tor exit node "spells" not bunch up near the beginning of the period when the IP is a new Tor node and the IP seems less likely to be blocked? Why are there Tor exit nodes that seem to have been listed in CollectTor for long periods of time without being blocked by Wikipedia?