Latest revision |
Your text |
Line 104: |
Line 104: |
| * Not all anonymous edits get flagged as anon. Editor name being an IP Address seems to work (Not confirmed). (Note: I've never seen a bug with this and I've done a lot of work with anon edits. -kc) | | * Not all anonymous edits get flagged as anon. Editor name being an IP Address seems to work (Not confirmed). (Note: I've never seen a bug with this and I've done a lot of work with anon edits. -kc) |
| * | | * |
|
| |
| == Samples ==
| |
|
| |
| Kaylea likes to use a script-generating script for wikiq.
| |
|
| |
| Step 1: Create a script-generating script like this:
| |
|
| |
| <nowiki>
| |
| #!/usr/bin/env python3
| |
| from os import path
| |
| import os
| |
| import stat
| |
| import glob
| |
|
| |
| ## this script makes wikiq scripts for a given dump path
| |
| dumpHome = '/gscratch/comdata/raw_data/'
| |
| outPath = '/gscratch/comdata/output/'
| |
| langDump = dumpHome + enwiki_20230401 #customize if needed
| |
|
| |
| ## customize output path
| |
| outPath = outPath + "wikiq_enwiki_name_this_something_useful/"
| |
|
| |
| archives = glob.glob(langDump + "/*pages-meta-hist*.7z") #makes a list of all the files, about 800 of them
| |
|
| |
| if not os.path.exists(outPath): #makes the dir for storing the output
| |
| os.makedirs(outPath)
| |
|
| |
| with open('run_wikiq.sh', 'w') as fh: #creates a script
| |
| for item in archives: #select options to customize the below as needed
| |
| # as you see above, wikiq has a ton of options.
| |
| # note that -o requires next field to be outPath; if more cmdline args are added, place before the -o.
| |
| # if you wanted to regex match misinf or disinf in the edit comment field, this is how you'd do it:
| |
| #fh.write(f"wikiq -u -CP '.*(misinf|disinf).*' -CPl comment -n 0 -n 1 -o {outPath} {item}\n")
| |
| # a more normal wikiq invocation is this:
| |
| fh.write(f"wikiq --collapse-user -u -o {outPath} {item}\n") </nowiki>
| |
|
| |
| Step 2: use the split command to turn your giant run_wikiq.sh script into a bunch of smaller files, named automatically things like xaa, xab, xac. For example, to make 40 lines per smaller script, do:
| |
| <nowiki>
| |
| split -l 40 run_wikiq.sh</nowiki>
| |
| After running split, if you type ls, you'll see the autonamed files, each containing part of your run_wikiq.sh script.
| |
|
| |
| Step 3: you can now run the subchunks of your script, e.g. use tmux to log in to the same node 10-15 times, running sh xaa in the first one, sh xab in the second one, and so on. This is more hands-on and not really a proper batch approach, but it lets you sail through certain kinds of disruptions while still getting your output quickly.
| |