CommunityData:Hyak software installation

From CommunityData
Remember that you can't install software on a login node! Make sure you check out a compute machine in the ways described on Communitydata:Hyak first!

You will often find that Hyak is missing software that you need to do you work.

There are special ways to install Python and R packages described below. The recommended way to manage other software for your research projects on Klone is to use Apptainer containers (formerly known as Singularity). At first, you probably do not need to know much about containers because we maintain a shared setup described below.

R packages[edit]

Installing R packages should be the way you expect.

  1. Make sure to check out a compute node as described on CommunityData:Hyak since you cannot build/install R packages on login nodes
  2. After that, you can install a package in the usual way like this:
> install.packages('lme4')

Python modules[edit]

There are a small number of basic Python modules that are installed in the CDSC Hyak Apptainer (see below). Python packages will need to be installed either per project or per user. Make sure to check out a compute node as described on CommunityData:Hyak since you cannot build/install venv python packages on login nodes.

One other small note about Python:

Python projects on Hyak should never have a Python path hardcoded in the "shebang" (the first line that starts with #!). If that line exists in your file, it must be #!/usr/bin/env python3. If you download somebody else's Python code and it has hard coded something like /usr/bin/python3, you will need to change it.

Installing Python Modules Per Project (with venvs)[edit]

You must setup Python venvs in the following way (i.e., not with the normal python -m venv <DIRECTORY> way). This is due to the containerized Python we breaks some things that Python assumes:

 
cd <PROJECT DIRECTORY>
create_cdsc_venv <DIRECTORY NAME>

So, if you created a venv called venv you would use it like:

 
cd <PROJECT DIRECTORY>
source venv/bin/activate

After that, you can install packages in the normal way with pip like:

 
pip install gensim pandas

Once you have activated your venv, you can also just run your Python scripts in the normal way, and they will "see" the modules you've installed.

Pip might decide it needs to uninstall something from the system as part of updating the version. If you really need the updated version, you can:

pip install --ignore-installed <libname>

Installing Python Modules User-Level[edit]

If you install Python packages per user, this will work for all your Python projects. That said, it's more likely to lead to conflicts when one package needs one version of a library and another needs and older/newer one. To do so is pretty simple (be sure to run from a login machine):

pip install --break-system-packages <PACKAGENAME>

Containers with Apptainer[edit]

We use Apptainer (formerly known as, and sometimes still referred to as Singularity) containers to install software on klone. Hyak provides a minimal operating system, so without these containers, installing software can be quite labor-intensive.

Our goal has been to make using software installed through apptainer as seamless as possible. For the most part, once you have your environment configured as above, you shouldn't have to think about the containers unless you need to install something new.

We created commands (e.g., python3, R, Rscript, jupyter-console) that run the containerized version of the program. The full list of such commands is in /gscratch/comdata/containers/bin.

Importantly, installing packages in R, Python (e.g., using pip) or other programming languages should usually work normally because the containers already have the most common dependencies. Installing packages this way will not update the container. Instead the packages will be installed in your user directory. This is desirable so that different container users do not break each other's environments. It may happen that an installation fails because it requires a missing dependency from the operating system. If this happens you can try to add the dependency to the container as described below. If this seems challenging or complicated or you need many changes to the container, or changes you don't understand, reach out to the IT team.

We will use multiple different apptainter containers for different applications to avoid incidentally breaking existing versions of packages during upgrades. We want containers that include "soft dependencies" that R or Python libraries might want.

To make a new container alias[edit]

For example, let's say you want to make a command to run jupyter-console for interactive python work and let's say you know that you want to run this from the cdsc_python.sif container located in /gscratch/comdata/containers/cdsc_python.

1. Ensure that the software you want to execute is installed in the container. Test this by running apptainer exec /gscratch jupyter-console.

2. Create an executable file in /gscratch/comdata/containers/bin. The file should look like:

#!/usr/bin/env bash

apptainer exec /gscratch/comdata/containers/cdsc_python/cdsc_python.sif jupyter-console "$@"

Creating a new apptainer container[edit]

Our goal is to write a apptainer definition file that will install the software that we want to work with. The definition file contains instructions for building a more reproducible environment. For example, the file cdsc_base.def contains instructions for installing an environment based on debian 13 (trixie). Once we have the definition file, we just have to run:

NOTE: For some reason building a container doesn't work on the /gscratch filesystem. Instead build containers on the /mmfs1 filesystem and then copy them to their eventual homes on /gscratch.

 apptainer build --fakeroot cdsc_base.sif cdsc_base.def

On a klone compute node to create the apptainer container cdsc_base.sif. This can take quite awhile to run as it downloads and installs a lot of software!

You can start a shell in the container using:

apptainer shell cdsc_base.sif

You can also just execute a single command using:

apptainer exec cdsc_base.sif echo "my command"

Sandbox containers don't seem to work consistently. It's better to just update the definition file and rebuild the container. It's a hassle, but it works.

So in summary, the workflow is:

  1. Develop a definition file (e.g., cdsc_base.dev) to setup your desired environment.
  2. Keep the definition file up to date with any modifications you make to the container in sandbox mode so your container is reproducible.
  3. Run programs in the container to work with files outside of it (possibly including other packages, allowing us to use debian to bootstrap klone-optimzed binaries).
  4. If you want to work on you local machine you can use the same definition file to install the container on your local machine.

Installing apptainer on your local computer[edit]

You might find it more convenient to develop your apptainer container on your local machine. You'll want the same version of apptainer on Hyak (you can find this with apptainer --version). Follow these instructions for installing apptainer on your local linux machine.

Update the R Container[edit]

We are using R from the Rocker project. To download and install the latest version you just run something like:

apptainer pull docker://rocker/r-base
ls -l r-base_latest.sif
cdsc_R.def  cdsc_R.sif  
apptainer exec r-base_latest.sif R --version
mv base_latest.sif /gscratch/comdata/containers/cdsc_R/r-VERSION.sif
ln -s /gscratch/comdata/containers/cdsc_R/r-VERSION.sif /gscratch/comdata/containers/cdsc_R/cdsc_R.sif