CommunityData:Klone: Difference between revisions

From CommunityData
(Singularity is now known as apptainer. It seems like the `singularity` alias on klone no longer works. So I changed mentions and invocations of "singularity" to "apptainer".)
 
(45 intermediate revisions by 6 users not shown)
Line 1: Line 1:
'''Klone''' is the latest version of hyak, the UW super computing system. We will soon have a larger allocation of machines on Klone than on Mox. The Klone machines have 40 cores and either 384GB or 768GB of RAM.  
'''Klone''' is the latest version of hyak, the UW super computing system. We will soon have a larger allocation of machines on Klone than on Mox. The Klone machines have 40 cores and either 384GB or 768GB of RAM. You can check storage allocation usage with the 'hyakstorage' command.


== Setup ==  
== Setting up SSH ==  
The recommended way to manage software for your research projects on Klone is to use [https://sylabs.io/docs/ Singularity containers].  You can build a singularity container on your local machine using the linux package manager of your choice. The instructions on this page document how to build the <code>cdsc_base.sif</code> singularity package which provides python, R, julia, and pyspark based on Debian 10 (Buster).


== Installing singularity on your local computer ==
When you connect to SSH, it will ask you for a key from your token. Typing this in every time you start a connection be a pain. One approach is to create an .ssh config file that will create a "tunnel" the first time you connect and send all subsequent connections to Hyak over that tunnel. Some details [http://wiki.cac.washington.edu/display/hyakusers/Logging+In in the Hyak documentation].
We want singularity version 3.7.1 which is the version installed oh klone. Follow [https://sylabs.io/guides/3.5/admin-guide/installation.html these instructions] for installing singularity on your local linux machine.  
== Creating a singularity container ==


The file <code>cdsc_base.def</code> is a singularity definition file that contains instructions for installing software and configuring the environment. We just have to run:
I've added the following config to the file <code>~/.ssh/config</code> on my laptop (you will want to change the username):
 
  Host klone klone.hyak.uw.edu
      User '''<YOURNETID>'''
      HostName klone.hyak.uw.edu
      ControlPath ~/.ssh/master-%r@%h:%p
      ControlMaster auto
      ControlPersist yes
      Compression yes
 
{{Note}} If your SSH connection becomes stale or disconnected (e.g., if you change networks) it may take some time for the connection to time out. Until that happens, any connections you make to hyak will silently hang. If your connections to ssh hyak are silently hanging but your Internet connection seems good, look for ssh processes running on your local machine with:
 
ps ax|grep klone
 
If you find any, kill them with <code>kill '''<PROCESSID>'''</code>. Once that is done, you should have no problem connecting to Hyak.
 
 
== Setting up your Environment ==
The recommended way to manage software for your research projects on Klone is to use [https://apptainer.org/docs/user/main/quick_start.html Apptainer containers] (formerly known as Singularity). At first, you probably do not need to know much about containers because we maintain a shared setup described below.  However, before getting to work on Klone, you'll need to set up an environment that provides our containerized commands and a few other conveniences. You do this by creating the following <code>.bashrc</code> file in your home directory (i.e., <code>/mmfs1/home/{your_username}</code>) where you land when you connect to klone.
 
=== Initial .Bashrc ===
Before we get started using our apptainer package on klone, we need to start with a <code>.bashrc</code>. Using a text editor (nano is a good choice if you don't already have a preference), create your <code>.bashrc</code> by pasting in the following code. Then run the command <code>source ~/.bashrc</code> to run the .bashrc and enable the environment.


<syntaxhighlight language='bash'>
<syntaxhighlight language='bash'>
  sudo singularity build --sandbox cdsc_base_sandbox cdsc_base.def
# .bashrc
 
export LOGIN_NODE=$(hostname | grep -q '^klone-login01' ; echo $?)
export SBATCH_EXPORT=BASH_ENV='~/.bashrc'
export SLURM_EXPORT_ENV=BASH_ENV='~/.bashrc'
export SLURM_EXPORT_ENV=BASH_ENV='~/.bashrc'
 
if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi
 
# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi
 
export PATH
 
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
 
source "/gscratch/comdata/env/cdsc_klone_bashrc"
 
if [[ "$LOGIN_NODE" == 0 ]]; then
:
else
 
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
 
# User specific aliases and functions
umask 007
export APPTAINER_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/gpfs:/gpfs,/sw:/sw,/usr:/kloneusr,/bin:/klonebin"
 
export OMP_THREAD_LIMIT=40
export OMP_NUM_THREADS=40
                                                                                                             
export PATH="$PATH:/gscratch/comdata/users/$(whoami)/bin:/gscratch/comdata/local/spark:/gscratch/comdata/local/bin"
source "/gscratch/comdata/users/nathante/spark_env.sh"
export _JAVA_OPTIONS="-Xmx362g"
fi
</syntaxhighlight>
 
==Connect to a Compute Node==
When you first SSH into Klone, you will be on your login node. Before you can do computational work, or use software installed in our containers (see below), you will need to log into a compute node from your login node. After your <code>~/.bashrc</code> file is setup and run, you can do so by running a SLURM job or use one of the aliases described in https://wiki.communitydata.science/CommunityData:Hyak_tutorial#Interactive_Jobs.
 
== About Containers ==
 
We use [https://apptainer.org/docs/user/latest/index.html Apptainer] (formerly known as, and sometimes still referred to as Singularity) containers to install software on klone.  Klone provides a very minimal operating system so without these containers, installing software can be quite labor-intensive.
Our goal has been to make using software installed through apptainer as seamless as possible.  For the most part, once you have your environment configured as above, you shouldn't have to think about the containers unless you need to install something new.
 
We created commands (e.g., <code>python3</code>, <code>Rscript</code>, <code>jupyter-console</code>) that run the containerized version of the program.  The full list of such commands is in <code>/gscratch/comdata/containers/bin</code>.
 
Importantly, installing packages in R, Python (e.g., using pip) or other programming languages should usually work normally because the containers already have the most common dependencies.  Installing packages this way will not update the container. Instead the packages will be installed in your user directory.  This is desirable so that different container users do not break each other's environments. It may happen that an installation fails because it requires a missing dependency from the operating system. If this happens you can try to add the dependency to the container as described below.  If this seems
challenging or complicated or you need many changes to the container, or changes you don't understand, reach out to the IT team.
 
We will use multiple different apptainter containers for different applications to avoid incidentally breaking existing versions of packages during upgrades.  We want containers that include "soft dependencies" that R or Python libraries might want.
 
== To make a new container alias ==
For example, let's say you want to make a command to run <code>jupyter-console</code> for interactive python work and let's say you know that you want to run this from the <code>cdsc_python.sif</code> container located in <code>/gscratch/comdata/containers/cdsc_python</code>.
 
1. Ensure that the software you want to execute is installed in the container. Test this by running <code> apptainer exec /gscratch jupyter-console</code>.
 
2. Create an executable file in /gscratch/comdata/containers/bin. The file should look like:
<syntaxhighlight lang='bash'>
#!/usr/bin/env bash
 
apptainer exec /gscratch/comdata/containers/cdsc_python/cdsc_python.sif jupyter-console.
 
</syntaxhighlight>
</syntaxhighlight>


This can take quite awhile to run as it installs a lot of software!
== Installing apptainer on your local computer ==
You might find it more convenient to develop your apptainer container on your local machine. You'll want apptainer version 3.4.2. which is the version installed on klone.  Follow [https://apptainer.org/docs/user/latest/quick_start.html these instructions] for installing apptainer on your local linux machine.
 
== Creating a apptainer container ==
 
Our goal is to write a apptainer definition file that will install the software that we want to work with.  The definition file contains instructions for building a more reproducible environment.  For example,  the file <code>cdsc_base.def</code> contains instructions for installing an environment based on debian 11 (bullseye). Once we have the definition file, we just have to run:
 
'''NOTE:''' For some reason building a container doesn't work on the <code>/gscratch</code> filesystem.  Instead build containers on the <code>/mmfs1</code> filesystem and then copy them to their eventual homes on <code>/gscratch</code>.


You might run into trouble with exceeding space in your temporary file path. If you do, run
<syntaxhighlight language='bash'>
<syntaxhighlight language='bash'>
sudo export SINGULARITY_TMPDIR=/my/large/tmp
apptainer build --fakeroot cdsc_base.sif cdsc_base.def
sudo export SINGULARITY_CACHEDIR=/my/large/apt_cache
sudo export SINGULARITY_LOCALCACHEDIR=/my/large/apt_cache
</syntaxhighlight>
</syntaxhighlight>
before running the build.


We built this container as a <code>sandbox</code> container, which is mutable. However, singularity recommends using immutable containers. We can convert the mutuable container to an immutable one by running build again.
On a klone compute node to create the apptainer container <code>cdsc_base.sif</code>. This can take quite awhile to run as it downloads and installs a lot of software!
 
You can start a shell in the container using:


<syntaxhighlight language='bash'>
<syntaxhighlight language='bash'>
sudo singularity build cdsc_base.sif cdsc_base_sandbox
apptainer shell cdsc_base.sif
</syntaxhighlight>
</syntaxhighlight>


Copy <code>cdsc_base.sif</code> to your user directory under <code>/gscratch</code> on klone.
You can also just execute a single command using:


You can open a shell in the container by running.
<syntaxhighlight language='bash'>
apptainer exec cdsc_base.sif echo "my command"
</syntaxhighlight>
 
Sandbox containers don't seem to work consistently.  It's better to just update the definition file and rebuild the container. It's a hassle, but it works.
<strike>
The <code>.sif</code> container is immutable, but you can modify it by converting it to a sandbox.  


<syntaxhighlight language='bash'>
<syntaxhighlight language='bash'>
singularity shell --no-home cdsc_base.sif
apptainer build --sandbox cdsc_base_sandbox cdsc_base.sif
</syntaxhighlight>
</syntaxhighlight>


The potentially confusing thing about using singularity on klone, stems from the fact that you have to be root to modify the root directories of a container. This is why you have to install software on the container locally. However, once you have made the immutable <code>cdsc_base.sif</code> file you can use the software installed in the container to do work outside of the container!
You might run into trouble with exceeding space in your temporary file path. If you do, run
<syntaxhighlight language='bash'>
sudo export APPTAINER_TMPDIR=/my/large/tmp
sudo export APPTAINER_CACHEDIR=/my/large/apt_cache
sudo export APPTAINER_LOCALCACHEDIR=/my/large/apt_cache
</syntaxhighlight>
before running the build.


The <code>cdsc_base_sandbox</code> is mutable, so we can continue working on that environment and installing more software as we like. We just need to build it as a <code>.sif</code> file to use it on klone.  It's also possible to convert the container back into sandbox mode and then modify non-root parts of the container on klone, but this requires running the container in a way that makes the outside klone system invisible! This is useful for installing R or Python packages in userspace within the container.  It's not that useful for working with data outside of the container.  
For developing a container it's useful to use a <code>sandbox</code> container, which is mutable
so you can continue installing software on it. However, you should add your changes to the definition file so you can build immutable containers that are as reproducible as possible.  


The <code>cdsc_base_sandbox</code> is mutable, so we can continue working on that environment and installing more software as we like.  We just need to build it as a <code>.sif</code> file to use it on klone.  It's also possible to convert the container back into sandbox mode and then modify non-root parts of the container on klone, but this requires running the container in a way that makes the outside klone system invisible! This is useful for installing R or Python packages in userspace within the container.  It's not that useful for working with data outside of the container.
</strike>
So in summary, the workflow is:
So in summary, the workflow is:


# Install software into a sandbox container on your local machine.
# Develop a definition file (<code>cdsc_base.dev</code>) to setup your desired environment.
# Keep the <code>cdsc_base.def</code> file up to date so your container is reproducible.
# Keep the definition file up to date with any modifications you make to the container in sandbox mode so your container is reproducible.  
# Convert the sandbox container to a immutable <code>.sif</code> container.
# Copy the immutable container to klone.  
# Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
# Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
# If you want to work on you local machine you can use the same definition file to install the container on your local machine.


== Initial .Bashrc ==
== Spark ==  
Before we get started using our singularity package on klone, we need to start with a <code>.bashrc</code> that just sets the <code>umask</code> so that other members of the group can edit your files and that loads singularity.


<syntaxhighlight language='bash'>
To set up a spark cluster using apptainer the first step to "run" the container on each node in the cluster:
# .bashrc
 
# Stuff that's in there already that you need for working with the cluster.
<syntaxhighlight lang='bash'>
# Add the following two lines
# on the first node
umask 007
apptainer instance start --fakeroot cdsc_base.sif spark-boss
module load singularity/3.7.1
export SPARK_BOSS=$(hostname)
## this makes it so singularity can see /gscratch/comdata
# on the first worker node (typically same as boss node)
export SINGULARITY_BIND="/gscratch/comdata:/gscratch/comdata"
apptainer instance start --fakeroot cdsc_base.sif spark-worker-1
# second worker node
apptainer instance start --fakeroot cdsc_base.sif spark-worker-2
</syntaxhighlight>
 
The second step is to start the spark services on the instances
 
<syntaxhighlight lang='bash'>
apptainer exec instance://spark-boss /opt/spark/sbin/start_master.sh
 
apptainer exec instance://spark-worker-1 /opt/spark/sbin/start-worker.sh $SPARK_BOSS:7077
</syntaxhighlight>
</syntaxhighlight>


== cdsc_base.def ==
That should be it. Though in practice it might make more sense to have special containers for the spark boss and workers.
<syntaxhighlight language='bash'>
Bootstrap: library
from: debian:10


%post
You can now submit spark jobs by running <code>spark-submit.sh</code>.
apt update && apt upgrade -y
apt install -y gnupg
apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran40/" > /etc/apt/sources.list.d/cloud-r-project-org.list
apt update && apt upgrade -y
apt install -y libopenblas-base
apt install -y r-base r-recommended emacs vim python3-sklearn jupyter moreutils julia default-jdk git curl meld xauth
curl -O https://downloads.apache.org/spark/KEYS
curl -O https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz.asc
curl -O https://mirror.jframeworks.com/apache/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
gpg --import KEYS
gpg --verify spark-3.1.1-bin-hadoop3.2.tgz.asc spark-3.1.1-bin-hadoop3.2.tgz
rm KEYS
export JAVA_HOME=/usr/lib/jvm/default-java
tar xvf spark-3.1.1-bin-hadoop3.2.tgz
mv spark-3.1.1-bin-hadoop3.2/ /opt/spark
curl -O https://apache.claz.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
curl -O https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz.asc
curl -O https://downloads.apache.org/hadoop/common/KEYS
gpg --import KEYS
gpg --verify hadoop-3.3.0-src.tar.gz.asc hadoop-3.3.0-src.tar
tar xvf hadoop-3.3.0.tar.gz
mv hadoop-3.3.0/ /opt/hadoop
export HADOOP_HOME=/opt/hadoop
apt clean
mkdir mmfs1
mkdir gscratch
mkdir xcatpost
mkdir gpfs
mkdir sw


%environment
<syntaxhighlight lang='bash'>
export JAVA_HOME=/usr/lib/jvm/default-java
# replace n3078 with the master hostname
export HADOOP_HOME=/opt/hadoop
apptainer exec instance://spark-boss /opt/spark/bin/spark --master spark://n3078.hyak.local:7077
export LC_ALL=C
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
</syntaxhighlight>
</syntaxhighlight>
Nate's working on wrapping the above nonsense in friendlier scripts.

Latest revision as of 16:11, 3 April 2024

Klone is the latest version of hyak, the UW super computing system. We will soon have a larger allocation of machines on Klone than on Mox. The Klone machines have 40 cores and either 384GB or 768GB of RAM. You can check storage allocation usage with the 'hyakstorage' command.

Setting up SSH[edit]

When you connect to SSH, it will ask you for a key from your token. Typing this in every time you start a connection be a pain. One approach is to create an .ssh config file that will create a "tunnel" the first time you connect and send all subsequent connections to Hyak over that tunnel. Some details in the Hyak documentation.

I've added the following config to the file ~/.ssh/config on my laptop (you will want to change the username):

 Host klone klone.hyak.uw.edu
     User <YOURNETID>
     HostName klone.hyak.uw.edu
     ControlPath ~/.ssh/master-%r@%h:%p
     ControlMaster auto
     ControlPersist yes
     Compression yes

Note Note: If your SSH connection becomes stale or disconnected (e.g., if you change networks) it may take some time for the connection to time out. Until that happens, any connections you make to hyak will silently hang. If your connections to ssh hyak are silently hanging but your Internet connection seems good, look for ssh processes running on your local machine with:

ps ax|grep klone

If you find any, kill them with kill <PROCESSID>. Once that is done, you should have no problem connecting to Hyak.


Setting up your Environment[edit]

The recommended way to manage software for your research projects on Klone is to use Apptainer containers (formerly known as Singularity). At first, you probably do not need to know much about containers because we maintain a shared setup described below. However, before getting to work on Klone, you'll need to set up an environment that provides our containerized commands and a few other conveniences. You do this by creating the following .bashrc file in your home directory (i.e., /mmfs1/home/{your_username}) where you land when you connect to klone.

Initial .Bashrc[edit]

Before we get started using our apptainer package on klone, we need to start with a .bashrc. Using a text editor (nano is a good choice if you don't already have a preference), create your .bashrc by pasting in the following code. Then run the command source ~/.bashrc to run the .bashrc and enable the environment.

# .bashrc

export LOGIN_NODE=$(hostname | grep -q '^klone-login01' ; echo $?)
export SBATCH_EXPORT=BASH_ENV='~/.bashrc'
export SLURM_EXPORT_ENV=BASH_ENV='~/.bashrc'
export SLURM_EXPORT_ENV=BASH_ENV='~/.bashrc'

if [ -f ~/.bash_aliases ]; then
    . ~/.bash_aliases
fi

# User specific environment
if ! [[ "$PATH" =~ "$HOME/.local/bin:$HOME/bin:" ]]
then
    PATH="$HOME/.local/bin:$HOME/bin:$PATH"
fi

export PATH

# Source global definitions
if [ -f /etc/bashrc ]; then
	. /etc/bashrc
fi

source "/gscratch/comdata/env/cdsc_klone_bashrc"

if [[ "$LOGIN_NODE" == 0 ]]; then
:
else

# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=

# User specific aliases and functions
umask 007
export APPTAINER_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/gpfs:/gpfs,/sw:/sw,/usr:/kloneusr,/bin:/klonebin"

export OMP_THREAD_LIMIT=40
export OMP_NUM_THREADS=40
                                                                                                              
export PATH="$PATH:/gscratch/comdata/users/$(whoami)/bin:/gscratch/comdata/local/spark:/gscratch/comdata/local/bin"
source "/gscratch/comdata/users/nathante/spark_env.sh"
export _JAVA_OPTIONS="-Xmx362g"
fi

Connect to a Compute Node[edit]

When you first SSH into Klone, you will be on your login node. Before you can do computational work, or use software installed in our containers (see below), you will need to log into a compute node from your login node. After your ~/.bashrc file is setup and run, you can do so by running a SLURM job or use one of the aliases described in https://wiki.communitydata.science/CommunityData:Hyak_tutorial#Interactive_Jobs.

About Containers[edit]

We use Apptainer (formerly known as, and sometimes still referred to as Singularity) containers to install software on klone. Klone provides a very minimal operating system so without these containers, installing software can be quite labor-intensive. Our goal has been to make using software installed through apptainer as seamless as possible. For the most part, once you have your environment configured as above, you shouldn't have to think about the containers unless you need to install something new.

We created commands (e.g., python3, Rscript, jupyter-console) that run the containerized version of the program. The full list of such commands is in /gscratch/comdata/containers/bin.

Importantly, installing packages in R, Python (e.g., using pip) or other programming languages should usually work normally because the containers already have the most common dependencies. Installing packages this way will not update the container. Instead the packages will be installed in your user directory. This is desirable so that different container users do not break each other's environments. It may happen that an installation fails because it requires a missing dependency from the operating system. If this happens you can try to add the dependency to the container as described below. If this seems challenging or complicated or you need many changes to the container, or changes you don't understand, reach out to the IT team.

We will use multiple different apptainter containers for different applications to avoid incidentally breaking existing versions of packages during upgrades. We want containers that include "soft dependencies" that R or Python libraries might want.

To make a new container alias[edit]

For example, let's say you want to make a command to run jupyter-console for interactive python work and let's say you know that you want to run this from the cdsc_python.sif container located in /gscratch/comdata/containers/cdsc_python.

1. Ensure that the software you want to execute is installed in the container. Test this by running apptainer exec /gscratch jupyter-console.

2. Create an executable file in /gscratch/comdata/containers/bin. The file should look like:

#!/usr/bin/env bash

apptainer exec /gscratch/comdata/containers/cdsc_python/cdsc_python.sif jupyter-console.

Installing apptainer on your local computer[edit]

You might find it more convenient to develop your apptainer container on your local machine. You'll want apptainer version 3.4.2. which is the version installed on klone. Follow these instructions for installing apptainer on your local linux machine.

Creating a apptainer container[edit]

Our goal is to write a apptainer definition file that will install the software that we want to work with. The definition file contains instructions for building a more reproducible environment. For example, the file cdsc_base.def contains instructions for installing an environment based on debian 11 (bullseye). Once we have the definition file, we just have to run:

NOTE: For some reason building a container doesn't work on the /gscratch filesystem. Instead build containers on the /mmfs1 filesystem and then copy them to their eventual homes on /gscratch.

 apptainer build --fakeroot cdsc_base.sif cdsc_base.def

On a klone compute node to create the apptainer container cdsc_base.sif. This can take quite awhile to run as it downloads and installs a lot of software!

You can start a shell in the container using:

apptainer shell cdsc_base.sif

You can also just execute a single command using:

apptainer exec cdsc_base.sif echo "my command"

Sandbox containers don't seem to work consistently. It's better to just update the definition file and rebuild the container. It's a hassle, but it works. The .sif container is immutable, but you can modify it by converting it to a sandbox.

apptainer build --sandbox cdsc_base_sandbox cdsc_base.sif

You might run into trouble with exceeding space in your temporary file path. If you do, run

sudo export APPTAINER_TMPDIR=/my/large/tmp
sudo export APPTAINER_CACHEDIR=/my/large/apt_cache
sudo export APPTAINER_LOCALCACHEDIR=/my/large/apt_cache

before running the build.

For developing a container it's useful to use a sandbox container, which is mutable so you can continue installing software on it. However, you should add your changes to the definition file so you can build immutable containers that are as reproducible as possible.

The cdsc_base_sandbox is mutable, so we can continue working on that environment and installing more software as we like. We just need to build it as a .sif file to use it on klone. It's also possible to convert the container back into sandbox mode and then modify non-root parts of the container on klone, but this requires running the container in a way that makes the outside klone system invisible! This is useful for installing R or Python packages in userspace within the container. It's not that useful for working with data outside of the container. So in summary, the workflow is:

  1. Develop a definition file (cdsc_base.dev) to setup your desired environment.
  2. Keep the definition file up to date with any modifications you make to the container in sandbox mode so your container is reproducible.
  3. Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
  4. If you want to work on you local machine you can use the same definition file to install the container on your local machine.

Spark[edit]

To set up a spark cluster using apptainer the first step to "run" the container on each node in the cluster:

# on the first node
apptainer instance start --fakeroot cdsc_base.sif spark-boss
export SPARK_BOSS=$(hostname)
# on the first worker node (typically same as boss node)
apptainer instance start --fakeroot cdsc_base.sif spark-worker-1
# second worker node
apptainer instance start --fakeroot cdsc_base.sif spark-worker-2

The second step is to start the spark services on the instances

apptainer exec instance://spark-boss /opt/spark/sbin/start_master.sh

apptainer exec instance://spark-worker-1 /opt/spark/sbin/start-worker.sh $SPARK_BOSS:7077

That should be it. Though in practice it might make more sense to have special containers for the spark boss and workers.

You can now submit spark jobs by running spark-submit.sh.

# replace n3078 with the master hostname
apptainer exec instance://spark-boss /opt/spark/bin/spark --master spark://n3078.hyak.local:7077

Nate's working on wrapping the above nonsense in friendlier scripts.