Klone is the latest version of hyak, the UW super computing system. We will soon have a larger allocation of machines on Klone than on Mox. The Klone machines have 40 cores and either 384GB or 768GB of RAM.
The recommended way to manage software for your research projects on Klone is to use Singularity containers. You can build a singularity container using the linux distribution manager of your choice (i.e., debian, ubuntu, centos). The instructions on this page document how to build the
cdsc_base.sif singularity package which provides python, R, julia, and pyspark based on Debian 11 (Bullseye).
Copies of the definition file and a working container are located at
Before we get started using our singularity package on klone, we need to start with a
# .bashrc # Stuff that's in there already that you need for working with the cluster. # Add the following two lines umask 007 module load singularity export SINGULARITY_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/xcatpost:/xcatpost,/gpfs:/gpfs,/sw:/sw" alias big_machine="srun -A comdata -p compute-bigmem --time=6:00:00 -c 40 --pty bash -l" alias huge_machine="srun -A comdata -p compute-hugemem --time=6:00:00 -c 40 --pty bash -l"
Installing singularity on your local computer
You might find it more convenient to develop your singularity container on your local machine. You'll want singularity version 3.4.2. which is the version installed on klone. Follow these instructions for installing singularity on your local linux machine.
Creating a singularity container
Our goal is to write a singularity definition file that will install the software that we want to work with. The definition file contains instructions for building a more reproducible environment. For example, the file
cdsc_base.def contains instructions for installing an environment based on debian 11 (bullseye). Once we have the definition file, we just have to run:
singularity build --fakeroot cdsc_base.sif cdsc_base.def
On a klone compute node to create the singularity container
cdsc_base.sif. This can take quite awhile to run as it downloads and installs a lot of software!
You can start a shell in the container using:
singularity shell cdsc_base.sif
You can also just execute a single command using:
singularity exec cdsc_base.sif echo "my command"
Sandbox containers don't seem to work consistently. It's better to just update the definition file and rebuild the container. It's a hassle, but it works.
.sif container is immutable, but you can modify it by converting it to a sandbox.
singularity build --sandbox cdsc_base_sandbox cdsc_base.sif
You might run into trouble with exceeding space in your temporary file path. If you do, run
sudo export SINGULARITY_TMPDIR=/my/large/tmp sudo export SINGULARITY_CACHEDIR=/my/large/apt_cache sudo export SINGULARITY_LOCALCACHEDIR=/my/large/apt_cache
before running the build.
For developing a container it's useful to use a
sandbox container, which is mutable
so you can continue installing software on it. However, you should add your changes to the definition file so you can build immutable containers that are as reproducible as possible.
So in summary, the workflow is:
cdsc_base_sandbox is mutable, so we can continue working on that environment and installing more software as we like. We just need to build it as a
.sif file to use it on klone. It's also possible to convert the container back into sandbox mode and then modify non-root parts of the container on klone, but this requires running the container in a way that makes the outside klone system invisible! This is useful for installing R or Python packages in userspace within the container. It's not that useful for working with data outside of the container.
- Develop a definition file (
cdsc_base.dev) to setup your desired environment.
- Keep the definition file up to date with any modifications you make to the container in sandbox mode so your container is reproducible.
- Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
- If you want to work on you local machine you can use the same definition file to install the container on your local machine.
To set up a spark cluster using singularity the first step to "run" the container on each node in the cluster:
# on the first node singularity instance start --fakeroot cdsc_base.sif spark-boss export SPARK_BOSS=$(hostname) # on the first worker node (typically same as boss node) singularity instance start --fakeroot cdsc_base.sif spark-worker-1 # second worker node singularity instance start --fakeroot cdsc_base.sif spark-worker-2
The second step is to start the spark services on the instances
singularity exec instance://spark-boss /opt/spark/sbin/start_master.sh singularity exec instance://spark-worker-1 /opt/spark/sbin/start-worker.sh $SPARK_BOSS:7077
That should be it. Though in practice it might make more sense to have special containers for the spark boss and workers.
You can now submit spark jobs by running
# replace n3078 with the master hostname singularity exec instance://spark-boss /opt/spark/bin/spark --master spark://n3078.hyak.local:7077
Nate's working on wrapping the above nonsense in friendlier scripts.
Bootstrap: library from: debian:bullseye %post echo "deb http://mirror.keystealth.org/debian bullseye main contrib" > "/etc/apt/sources.list" apt update && apt upgrade -y apt install -y gnupg curl curl -O https://downloads.apache.org/spark/KEYS curl -O https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz.asc curl -O https://mirror.jframeworks.com/apache/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz gpg --import KEYS ls gpg --verify spark-3.1.1-bin-hadoop3.2.tgz.asc spark-3.1.1-bin-hadoop3.2.tgz rm KEYS export JAVA_HOME=/usr/lib/jvm/default-java tar xvf spark-3.1.1-bin-hadoop3.2.tgz mv spark-3.1.1-bin-hadoop3.2/ /opt/spark curl -O https://mirror.jframeworks.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz curl -O https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz.asc curl -O https://downloads.apache.org/hadoop/common/KEYS gpg --import KEYS ls gpg --verify hadoop-3.3.0.tar.gz.asc hadoop-3.3.0.tar.gz tar xvf hadoop-3.3.0.tar.gz mv hadoop-3.3.0/ /opt/hadoop export HADOOP_HOME=/opt/hadoop apt install -y libopenblas-base apt install -y r-base r-recommended emacs vim python3-sklearn jupyter moreutils julia default-jdk git curl meld xauth python3-venv python3-pip apt-utils ncdu apt clean mkdir mmfs1 mkdir gscratch mkdir xcatpost mkdir gpfs mkdir sw rm hadoop-3.3.0.tar.gz hadoop-3.3.0.tar.gz.asc KEYS spark-3.1.1-bin-hadoop3.2.tgz spark-3.1.1-bin-hadoop3.2.tgz.asc %environment export JAVA_HOME=/usr/lib/jvm/default-java export HADOOP_HOME=/opt/hadoop export LC_ALL=C export SPARK_HOME=/opt/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native