CommunityData:Hyak migration (ikt to mox)

From CommunityData
Revision as of 23:16, 30 July 2024 by Benjamin Mako Hill (talk | contribs) (Benjamin Mako Hill moved page CommunityData:Hyak migration to CommunityData:Hyak migration (ikt to mox) without leaving a redirect)

This page is a list of things that we want to do to migrate from ikt to mox.

  1. Copy data (only raw data, data that we are using in current and future projects)
  2. Backup other data?
  3. Copy code (Everyone copy their own user directory)
  4. Create a shared .bashrc that everyone will load. This will provide us with a shared environment (python, R, other packages, useful aliases).

Hyak Migration Working Group

  1. Mako
  2. Kaylea
  3. Nate
  4. Sayamindu
  5. Jeremy
  6. Jim?

Time Table

Hyak migration schedule
Date Task to be done Who needs to do it
2020-01-17 Collect list of software to install on Mox Everyone
2020-02-01 Install software on mox and setup shared .bashrc Nate (possibly with help from Sayamindu + Mako)
2020-02-15 Test setup on Mox Everyone (Especially Kaylea)
2020-03-01 Start primarily using Mox Everyone
2020-05-01 Evacuate Ikt! It's the end! Everyone

Shared environment design

We will use custom modules to maintain installations of software that we use. Sometimes the hyak team already provides a module that we need (i.e. up-to-date R and Python) then we should prefer these packages so we don't have to do the work of compiling and packaging the modules. But if we want to be on the cutting edge of python and R we'll be in the business.

Since I (Nate) typically develop code on my laptop before running it on hyak. I think it is ideal if our Hyak environment maintains versions of software that are equivalent to those included in Debian Buster whenever possible. Ideally we will even provide modules for important R and Python packages (e.g. spark, ggplot, pandas) so that we can keep versions consistent and stable over time.

We'll create a list of packages that people can expect to be loaded in their environments and load them in the shared .bashrc.

We'll also provide a shared .bash_aliases that provide common commands for interacting with slurm.

List of modules we'll maintain on Hyak (WIP)

We can get a list of packages from /gscratch/com/local/bin on Ikt.

Add packages you want below!

[X] zsh

[X] Spark 2.4

[X] Python 3.7 Installed Anoconda and created a minimal anaconda environment to speed up startup time. This seems like the easiest way to get an optimized python installation.

[X] R 3.6.2

[X] moreutils 0.62 (seems like at least some of the moreutils are broken (i.e. parallel))

[X] emacs 25

[X] p7zip 16.02

[X] htop 2.2.0

[X] pandoc 2.2.1

[X] gcc 4.9+

RStudio Server

It could be nice to run an RStudio server on the interactive node to provide a nicer IDE for working interactively on hyak compared to Jupyter notebooks or editing in the terminal. If this isn't feasible then we should use kibo for this purpose instead.

Etherpad link

https://etherpad.wikimedia.org/p/cdsc_hyak_migration_todo

Scheduler Options

It might be a good idea to ask the hyak folks to configure the scheduler for our partition so that we can request specific quantities of memory or cpus in our jobs. (Hyak Wiki)

This might be particularly useful if we don't get more nodes soon because it allows us to chunk up nodes into smaller pieces.

Getting more nodes

We have funding in the ecology grant (and maybe other sources of funding as well) that we can use to purchase additional Mox capacity. Let's keep track of plans around that to try to minimize the gap from the time we lose Ikt nodes until we get more Mox nodes.