Editing CommunityData:Klone

'''Klone''' is the latest version of hyak, the UW super computing system. We will soon have a larger allocation of machines on Klone than on Mox. The Klone machines have 40 cores and either 384GB or 768GB of RAM. 

== Setting up SSH == 

When you connect to SSH, it will ask you for a key from your token. Typing this in every time you start a connection be a pain. One approach is to create an .ssh config file that will create a "tunnel" the first time you connect and send all subsequent connections to Hyak over that tunnel. Some details [http://wiki.cac.washington.edu/display/hyakusers/Logging+In in the Hyak documentation].

I've added the following config to the file <code>~/.ssh/config</code> on my laptop (you will want to change the username):

  Host klone klone.hyak.uw.edu
      User '''<YOURNETID>'''
      HostName klone.hyak.uw.edu
      ControlPath ~/.ssh/master-%r@%h:%p
      ControlMaster auto
      ControlPersist yes
      Compression yes

{{Note}} If your SSH connection becomes stale or disconnected (e.g., if you change networks) it may take some time for the connection to time out. Until that happens, any connections you make to hyak will silently hang. If your connections to ssh hyak are silently hanging but your Internet connection seems good, look for ssh processes running on your local machine with:

 ps ax|grep klone

If you find any, kill them with <code>kill '''<PROCESSID>'''</code>. Once that is done, you should have no problem connecting to Hyak.


== New Container Setup == 
We will use multiple different singularity containers for different applications to avoid incidentally breaking existing versions of packages during upgrades.  We want to containers that include "soft dependencies" that R or Python libraries might want. 

We haven't built the containers yet, but let's start keeping track of dependencies that we'll need. 
=== R ===

* Node.js
* graphviz


=== Python ===
* spark

To make singularity transparent to users, we will use simple bash executables to alias popular commands. The list of commands to alias includes:

* python
* python3
* R
* Rscript
* jupyter-console
* jupyter-notebook

=== Questions for the group === 

What executable do you want in containers?

== To make a new container alias == 
For example, let's say you want to make a command to run <code>jupyter-console</code> for interactive python work and let's say you know that you want to run this from the <code>cdsc_python.sif</code> container located in <code>/gscratch/comdata/containers/cdsc_python</code>.

1. Ensure that the software you want to execute is installed in the container. Test this by running <code> singularity exec /gscratch jupyter-console</code>.

2. Create an executable file in /gscratch/comdata/containers/bin. The file should look like:
<syntaxhighlight lang='bash'>
#!/usr/bin/env bash

singularity exec /gscratch/comdata/containers/cdsc_python/cdsc_python.sif jupyter-console.

</syntaxhighlight>

== Setup == 
The recommended way to manage software for your research projects on Klone is to use [https://sylabs.io/docs/ Singularity containers].  You can build a singularity container using the linux distribution manager of your choice (i.e., debian, ubuntu, centos). The instructions on this page document how to build the <code>cdsc_base.sif</code> singularity package which provides python, R, julia, and pyspark based on Debian 11 (Bullseye).

Copies of the definition file and a working container are located at <code>/gscratch/comdata/containers/cdsc_base/</code>.

=== Initial .Bashrc ===
Before we get started using our singularity package on klone, we need to start with a <code>.bashrc</code>.

<syntaxhighlight language='bash'>
# .bashrc

# Source global definitions
if [ -f /etc/bashrc ]; then
	. /etc/bashrc
fi
source /gscratch/comdata/env/cdsc_klone_bashrc

# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
export PATH

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

# User specific aliases and functions
umask 007

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi
# export #SINGULARITY_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/xcatpost:/xcatpost,/gpfs:/gpfs,/sw:/sw,/usr:/kloneusr,/bin:/klonebinc"
export APPTAINER_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/gpfs:/gpfs,/sw:/sw,/usr:/kloneusr,/bin:/klonebin"

export OMP_THREAD_LIMIT=40
export OMP_NUM_THREADS=40
                                                                                                              
export PATH="$PATH:/gscratch/comdata/users/$(whoami)/bin:/gscratch/comdata/local/spark:/gscratch/comdata/local/bin"
source "/gscratch/comdata/users/nathante/spark_env.sh"
export _JAVA_OPTIONS="-Xmx362g"
export PATH="$PATH:~/.local/bin/"

</syntaxhighlight>

== Installing singularity on your local computer ==
You might find it more convenient to develop your singularity container on your local machine. You'll want singularity version 3.4.2. which is the version installed on klone.  Follow [https://sylabs.io/guides/3.5/admin-guide/installation.html these instructions] for installing singularity on your local linux machine.

== Creating a singularity container == 

Our goal is to write a singularity definition file that will install the software that we want to work with.  The definition file contains instructions for building a more reproducible environment.  For example,  the file <code>cdsc_base.def</code> contains instructions for installing an environment based on debian 11 (bullseye). Once we have the definition file, we just have to run:

'''NOTE:''' For some reason building a container doesn't work on the <code>/gscratch</code> filesystem.  Instead build containers on the <code>/mmfs1</code> filesystem and then copy them to their eventual homes on <code>/gscratch</code>. 

<syntaxhighlight language='bash'>
 singularity build --fakeroot cdsc_base.sif cdsc_base.def
</syntaxhighlight>

On a klone compute node to create the singularity container <code>cdsc_base.sif</code>.  This can take quite awhile to run as it downloads and installs a lot of software!

You can start a shell in the container using:

<syntaxhighlight language='bash'>
singularity shell cdsc_base.sif
</syntaxhighlight>

You can also just execute a single command using:

<syntaxhighlight language='bash'>
singularity exec cdsc_base.sif echo "my command"
</syntaxhighlight>

Sandbox containers don't seem to work consistently.  It's better to just update the definition file and rebuild the container. It's a hassle, but it works. 
<strike>
The <code>.sif</code> container is immutable, but you can modify it by converting it to a sandbox. 

<syntaxhighlight language='bash'>
singularity build --sandbox cdsc_base_sandbox cdsc_base.sif
</syntaxhighlight>

You might run into trouble with exceeding space in your temporary file path. If you do, run 
<syntaxhighlight language='bash'>
sudo export SINGULARITY_TMPDIR=/my/large/tmp
sudo export SINGULARITY_CACHEDIR=/my/large/apt_cache
sudo export SINGULARITY_LOCALCACHEDIR=/my/large/apt_cache
</syntaxhighlight>
before running the build.

For developing a container it's useful to use a <code>sandbox</code> container, which is mutable
so you can continue installing software on it. However, you should add your changes to the definition file so you can build immutable containers that are as reproducible as possible. 

The <code>cdsc_base_sandbox</code> is mutable, so we can continue working on that environment and installing more software as we like.  We just need to build it as a <code>.sif</code> file to use it on klone.  It's also possible to convert the container back into sandbox mode and then modify non-root parts of the container on klone, but this requires running the container in a way that makes the outside klone system invisible! This is useful for installing R or Python packages in userspace within the container.  It's not that useful for working with data outside of the container.
</strike>
So in summary, the workflow is:

# Develop a definition file (<code>cdsc_base.dev</code>) to setup your desired environment.
# Keep the definition file up to date with any modifications you make to the container in sandbox mode so your container is reproducible. 
# Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
# If you want to work on you local machine you can use the same definition file to install the container on your local machine.

== Spark == 

To set up a spark cluster using singularity the first step to "run" the container on each node in the cluster:

<syntaxhighlight lang='bash'>
# on the first node
singularity instance start --fakeroot cdsc_base.sif spark-boss
export SPARK_BOSS=$(hostname)
# on the first worker node (typically same as boss node)
singularity instance start --fakeroot cdsc_base.sif spark-worker-1
# second worker node
singularity instance start --fakeroot cdsc_base.sif spark-worker-2
</syntaxhighlight>

The second step is to start the spark services on the instances

<syntaxhighlight lang='bash'>
singularity exec instance://spark-boss /opt/spark/sbin/start_master.sh

singularity exec instance://spark-worker-1 /opt/spark/sbin/start-worker.sh $SPARK_BOSS:7077
</syntaxhighlight>

That should be it. Though in practice it might make more sense to have special containers for the spark boss and workers.

You can now submit spark jobs by running <code>spark-submit.sh</code>.

<syntaxhighlight lang='bash'>
# replace n3078 with the master hostname
singularity exec instance://spark-boss /opt/spark/bin/spark --master spark://n3078.hyak.local:7077
</syntaxhighlight>

Nate's working on wrapping the above nonsense in friendlier scripts.