Editing CommunityData:Hyak walkthrough
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 1: | Line 1: | ||
This file provides a complete, step-by-step walkthrough for how to parse a list of Wikia wikis with wikiq. The same principles can be followed for other tasks. | This file provides a complete, step-by-step walkthrough for how to parse a list of Wikia wikis with wikiq. The same principles can be followed for other tasks. | ||
Line 7: | Line 5: | ||
# Create a users directory for yourself in / | # Create a users directory for yourself in /com/users | ||
#* You will want to store the output of your script in / | #* You will want to store the output of your script in /com/, or you will run out of space in your personal filesystem (/usr/lusers/...) | ||
#: <code> $ mkdir / | #: <code> $ mkdir /com/users/USERNAME # Replace USERNAME with your user name </code> | ||
# Create a batch_jobs directory | # Create a batch_jobs directory | ||
#: <code> $ mkdir / | #: <code> $ mkdir /com/users/USERNAME/batch_jobs </code> | ||
# Create a symlink from your home directory to this directory (this lets you use the / | # Create a symlink from your home directory to this directory (this lets you use the /com storage from the more convenient home directory) | ||
#: <code> ln -s / | #: <code> ln -s /com/users/USERNAME/batch_jobs ~/batch_jobs </code> | ||
# Create a user in parallel SQL | # Create a user in parallel SQL | ||
#: <code> module load parallel_sql </code> | #: <code> module load parallel_sql </code> | ||
#: <code> sudo pssu --initial </code> | #: <code> sudo pssu --initial </code> | ||
#: <code> [sudo] password for USERID: <Enter your UW NetID password> </code> | #: <code> [sudo] password for USERID: <Enter your UW NetID password> </code> | ||
== Project-specific steps (done for each project) == | == Project-specific steps (done for each project) == | ||
Line 67: | Line 66: | ||
#: <code> rm ./output/* </code> | #: <code> rm ./output/* </code> | ||
#* and clean up the parallel SQL DB | #* and clean up the parallel SQL DB | ||
#: <code> psu --del</code> | #: <code> psu --del-com </code> | ||
# Finally, run the jobs over the full set of files | # Finally, run the jobs over the full set of files | ||
#: <code> cat task_list | psu --load </code> | #: <code> cat task_list | psu --load </code> |