CommunityData:Klone: Difference between revisions

From CommunityData
(Build environments on klone using fakeroot)
Line 2: Line 2:


== Setup ==  
== Setup ==  
The recommended way to manage software for your research projects on Klone is to use [https://sylabs.io/docs/ Singularity containers].  You can build a singularity container on your local machine using the linux package manager of your choice. The instructions on this page document how to build the <code>cdsc_base.sif</code> singularity package which provides python, R, julia, and pyspark based on Debian 10 (Buster).
The recommended way to manage software for your research projects on Klone is to use [https://sylabs.io/docs/ Singularity containers].  You can build a singularity container using the linux distribution manager of your choice (i.e., debian, ubuntu, centos). The instructions on this page document how to build the <code>cdsc_base.sif</code> singularity package which provides python, R, julia, and pyspark based on Debian 11 (Bullseye).
 
=== Initial .Bashrc ===
Before we get started using our singularity package on klone, we need to start with a <code>.bashrc</code>.
 
<syntaxhighlight language='bash'>
# .bashrc
# Stuff that's in there already that you need for working with the cluster.
# Add the following two lines
umask 007
module load singularity
## this makes it so singularity can see /gscratch/comdata
export SINGULARITY_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/xcatpost:/xcatpost,/gpfs:/gpfs,/sw:/sw"
</syntaxhighlight>
 


== Installing singularity on your local computer ==
== Installing singularity on your local computer ==
We want singularity version 3.7.1 which is the version installed oh klone.  Follow [https://sylabs.io/guides/3.5/admin-guide/installation.html these instructions] for installing singularity on your local linux machine.  
You might find it more convenient to develop your singularity container on your local machine. You'll want singularity version 3.4.2. which is the version installed oh klone.  Follow [https://sylabs.io/guides/3.5/admin-guide/installation.html these instructions] for installing singularity on your local linux machine.  
   
   
== Creating a singularity container ==  
== Creating a singularity container ==  


The file <code>cdsc_base.def</code> is a singularity definition file that contains instructions for installing software and configuring the environment. We just have to run:
Our goal is to write a singularity definition file that will install the software that we want to work with.  The definition file contains instructions for building a more reproducible environment.  For example,  the file <code>cdsc_base.def</code> contains instructions for installing an environment based on debian 11 (bullseye). Once we have the definition file, we just have to run:


<syntaxhighlight language='bash'>
<syntaxhighlight language='bash'>
  sudo singularity build --sandbox cdsc_base_sandbox cdsc_base.def
  singularity build --fakeroot cdsc_base.sif cdsc_base.def
</syntaxhighlight>
</syntaxhighlight>


This can take quite awhile to run as it installs a lot of software!
On a klone compute node to create the singularity container <code>cdsc_base.sif</code>.  This can take quite awhile to run as it downloads and installs a lot of software!
 
You can start a shell in the container using:
 
<syntaxhighlight language='bash'>
singularity shell cdsc_base.sif
</syntaxhighlight>
 
You can also just execute a single command using:
 
<syntaxhighlight language='bash'>
singularity exec cdsc_base.sif echo "my command"
</syntaxhighlight>
 
 
 
The <code>.sif</code> container is immutable, but you can modify it by converting it to a sandbox.
 
<syntaxhighlight language='bash'>
singularity build --sandbox --fakeroot cdsc_base_sandbox cdsc_base.sif
</syntaxhighlight>


You might run into trouble with exceeding space in your temporary file path. If you do, run  
You might run into trouble with exceeding space in your temporary file path. If you do, run  
Line 25: Line 59:
before running the build.
before running the build.


We built this container as a <code>sandbox</code> container, which is mutable. However, singularity recommends using immutable containers. We can convert the mutuable container to an immutable one by running build again.
For developing a container it's useful to use a <code>sandbox</code> container, which is mutable
so you can continue installing software on it. However, you should switch to an immutable container for ensuring that your container is reproducible. We can convert the mutuable container to an immutable one by running the build again.


<syntaxhighlight language='bash'>
<syntaxhighlight language='bash'>
Line 35: Line 70:
You can open a shell in the container by running.
You can open a shell in the container by running.


<syntaxhighlight language='bash'>
singularity shell --no-home cdsc_base.sif
</syntaxhighlight>


The potentially confusing thing about using singularity on klone, stems from the fact that you have to be root to modify the root directories of a container. This is why you have to install software on the container locally. However, once you have made the immutable <code>cdsc_base.sif</code> file you can use the software installed in the container to do work outside of the container!
The potentially confusing thing about using singularity on klone, stems from the fact that you have to be root to modify the root directories of a container. This is why you have to install software on the container locally. However, once you have made the immutable <code>cdsc_base.sif</code> file you can use the software installed in the container to do work outside of the container!
Line 50: Line 82:
# Copy the immutable container to klone.  
# Copy the immutable container to klone.  
# Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
# Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
== Initial .Bashrc ==
Before we get started using our singularity package on klone, we need to start with a <code>.bashrc</code> that just sets the <code>umask</code> so that other members of the group can edit your files and that loads singularity.
<syntaxhighlight language='bash'>
# .bashrc
# Stuff that's in there already that you need for working with the cluster.
# Add the following two lines
umask 007
module load singularity/3.7.1
## this makes it so singularity can see /gscratch/comdata
export SINGULARITY_BIND="/gscratch/comdata:/gscratch/comdata"
</syntaxhighlight>


== cdsc_base.def ==
== cdsc_base.def ==

Revision as of 23:25, 15 March 2021

Klone is the latest version of hyak, the UW super computing system. We will soon have a larger allocation of machines on Klone than on Mox. The Klone machines have 40 cores and either 384GB or 768GB of RAM.

Setup

The recommended way to manage software for your research projects on Klone is to use Singularity containers. You can build a singularity container using the linux distribution manager of your choice (i.e., debian, ubuntu, centos). The instructions on this page document how to build the cdsc_base.sif singularity package which provides python, R, julia, and pyspark based on Debian 11 (Bullseye).

Initial .Bashrc

Before we get started using our singularity package on klone, we need to start with a .bashrc.

# .bashrc
# Stuff that's in there already that you need for working with the cluster.
# Add the following two lines
umask 007
module load singularity
## this makes it so singularity can see /gscratch/comdata
export SINGULARITY_BIND="/gscratch:/gscratch,/mmfs1:/mmfs1,/xcatpost:/xcatpost,/gpfs:/gpfs,/sw:/sw"


Installing singularity on your local computer

You might find it more convenient to develop your singularity container on your local machine. You'll want singularity version 3.4.2. which is the version installed oh klone. Follow these instructions for installing singularity on your local linux machine.

Creating a singularity container

Our goal is to write a singularity definition file that will install the software that we want to work with. The definition file contains instructions for building a more reproducible environment. For example, the file cdsc_base.def contains instructions for installing an environment based on debian 11 (bullseye). Once we have the definition file, we just have to run:

 singularity build --fakeroot cdsc_base.sif cdsc_base.def

On a klone compute node to create the singularity container cdsc_base.sif. This can take quite awhile to run as it downloads and installs a lot of software!

You can start a shell in the container using:

singularity shell cdsc_base.sif

You can also just execute a single command using:

singularity exec cdsc_base.sif echo "my command"


The .sif container is immutable, but you can modify it by converting it to a sandbox.

singularity build --sandbox --fakeroot cdsc_base_sandbox cdsc_base.sif

You might run into trouble with exceeding space in your temporary file path. If you do, run

sudo export SINGULARITY_TMPDIR=/my/large/tmp
sudo export SINGULARITY_CACHEDIR=/my/large/apt_cache
sudo export SINGULARITY_LOCALCACHEDIR=/my/large/apt_cache

before running the build.

For developing a container it's useful to use a sandbox container, which is mutable so you can continue installing software on it. However, you should switch to an immutable container for ensuring that your container is reproducible. We can convert the mutuable container to an immutable one by running the build again.

sudo singularity build cdsc_base.sif cdsc_base_sandbox

Copy cdsc_base.sif to your user directory under /gscratch on klone.

You can open a shell in the container by running.


The potentially confusing thing about using singularity on klone, stems from the fact that you have to be root to modify the root directories of a container. This is why you have to install software on the container locally. However, once you have made the immutable cdsc_base.sif file you can use the software installed in the container to do work outside of the container!

The cdsc_base_sandbox is mutable, so we can continue working on that environment and installing more software as we like. We just need to build it as a .sif file to use it on klone. It's also possible to convert the container back into sandbox mode and then modify non-root parts of the container on klone, but this requires running the container in a way that makes the outside klone system invisible! This is useful for installing R or Python packages in userspace within the container. It's not that useful for working with data outside of the container.

So in summary, the workflow is:

  1. Install software into a sandbox container on your local machine.
  2. Keep the cdsc_base.def file up to date so your container is reproducible.
  3. Convert the sandbox container to a immutable .sif container.
  4. Copy the immutable container to klone.
  5. Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).

cdsc_base.def

Bootstrap: library
from: debian:10

%post
	apt update && apt upgrade -y
	apt install -y gnupg
	apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'
	echo "deb http://cloud.r-project.org/bin/linux/debian buster-cran40/" > /etc/apt/sources.list.d/cloud-r-project-org.list
	apt update && apt upgrade -y
	apt install -y libopenblas-base
	apt install -y r-base r-recommended emacs vim python3-sklearn jupyter moreutils julia default-jdk git curl meld xauth
	curl -O https://downloads.apache.org/spark/KEYS
	curl -O https://downloads.apache.org/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz.asc
	curl -O https://mirror.jframeworks.com/apache/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
	gpg --import KEYS
	gpg --verify spark-3.1.1-bin-hadoop3.2.tgz.asc spark-3.1.1-bin-hadoop3.2.tgz
	rm KEYS
	export JAVA_HOME=/usr/lib/jvm/default-java
	tar xvf spark-3.1.1-bin-hadoop3.2.tgz
	mv spark-3.1.1-bin-hadoop3.2/ /opt/spark
	curl -O https://apache.claz.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz
	curl -O https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz.asc
	curl -O https://downloads.apache.org/hadoop/common/KEYS
	gpg --import KEYS
	gpg --verify hadoop-3.3.0-src.tar.gz.asc hadoop-3.3.0-src.tar
	tar xvf hadoop-3.3.0.tar.gz
	mv hadoop-3.3.0/ /opt/hadoop
	export HADOOP_HOME=/opt/hadoop
	apt clean
	mkdir mmfs1
	mkdir gscratch
	mkdir xcatpost
	mkdir gpfs
	mkdir sw

%environment
	export JAVA_HOME=/usr/lib/jvm/default-java
	export HADOOP_HOME=/opt/hadoop
	export LC_ALL=C
	export SPARK_HOME=/opt/spark
	export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
	export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native