Editing CommunityData:Hyak tutorial
From CommunityData
The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.
Latest revision | Your text | ||
Line 31: | Line 31: | ||
# Run [https://linux.die.net/man/1/screen screen] or [https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/ tmux] to maintain connections over time ([[CommunityData:tmux|CDSC tmux cheatsheet]]) | # Run [https://linux.die.net/man/1/screen screen] or [https://www.hamvocke.com/blog/a-quick-and-easy-guide-to-tmux/ tmux] to maintain connections over time ([[CommunityData:tmux|CDSC tmux cheatsheet]]) | ||
# | # Four ways to check out nodes: | ||
:* <code>int_machine</code> — interactive machine ( | :* <code>int_machine</code> — interactive machine (shared with the group) '''[USE THIS FIRST!]''' | ||
:* <code>any_machine</code> — dedicated interactive machine | :* <code>any_machine</code> — dedicated interactive machine | ||
:* <code>big_machine</code> — dedicated interactive machine with large amounts of memory | :* <code>big_machine</code> — dedicated interactive machine with large amounts of memory | ||
:* <code>build_machine</code> — interactive machine with an Internet connection for building R modules and so on | :* <code>build_machine</code> — interactive machine with an Internet connection for building R modules and so on | ||
=== Running a job across many cores using GNU R's parallelization features === | === Running a job across many cores using GNU R's parallelization features === | ||
The Mox machines have 28 | The hyak machines have 16 cpu cores. The Mox machines will have 28! Running your program on all the cores can speed things up a lot! We make heavy use of R for building datasets and for fitting models. Like most programming languages, R uses only one cpu by default. However, for typical computation-heavy data science tasks it is pretty easy to make R use all the cores. | ||
For fitting models, the R installed should use all cores automatically. This is thanks to OpenBlas, which is a numerical library that implements and parallelizes linear algebra routines like matrix factorization, matrix inversion, and other operations that bottleneck model fitting. | For fitting models, the R installed should use all cores automatically. This is thanks to OpenBlas, which is a numerical library that implements and parallelizes linear algebra routines like matrix factorization, matrix inversion, and other operations that bottleneck model fitting. | ||
Line 106: | Line 87: | ||
=== Setup for running batch jobs on Hyak (only need to be done once) === | === Setup for running batch jobs on Hyak (only need to be done once) === | ||
Create a users directory for yourself in / | Create a users directory for yourself in /com/users: | ||
You will want to store the output of your script in / | You will want to store the output of your script in /com/, or you will run out of space in your personal filesystem (/usr/lusers/...) | ||
$ mkdir / | $ mkdir /com/users/USERNAME # Replace USERNAME with your user name | ||
2. Create a batch_jobs directory | 2. Create a batch_jobs directory | ||
$ mkdir / | $ mkdir /com/users/USERNAME/batch_jobs | ||
3. Create a symlink from your home directory to this directory (this lets you use the /com storage from the more convenient home directory) | 3. Create a symlink from your home directory to this directory (this lets you use the /com storage from the more convenient home directory) | ||
$ ln -s / | $ ln -s /com/users/USERNAME/batch_jobs ~/batch_jobs | ||
4. Create a user in parallel SQL | 4. Create a user in parallel SQL |