Clusters¶
The Oden Institute has a number of small clusters that are owned by centers and are used only by those affiliated with the respective center.
CRIOS sverdrup¶
Note
Sverdrup and its associated storage node (sverdrup-nas) both went under upgrades beginning late March 2024.
System information:
OpenHPC cluster running Rocky Linux 9.3 (https://openhpc.community/)
$HOME is an NFS file system -> /home (2.0 TB)
/scratch is 105TB
/scratch2 is 125TB
/opt/apps/ is an NFS file system -> 100GB
Queuing system is SLURM 22.05. One queue available –> normal (35 Intel Omnipath nodes -> 28 cores/node, 64 GB/node, 980 cores)
Node composition:
Dual-socket 14-core Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (Haswell)
64 GB of RAM per node
Omni-Path high performance communications HBA (100Gbs)
10Gbs networking
Wish list¶
Requests by research staff with the CRIOS group.
CRIOS cluster is getting a major reboot from March 25 - April 5 and will be unreachable at that time. Below is a request list of software and features Patrick can pass to Oden RT, in no particular order:
git >= 2.31
Allow all users to ssh into compute nodes used interactively
Update GNU compiler collection >= 12.2, but still need all the current GNU collection
tmux >= 2.7
Keep z-shell as a shell option
Keep the feature to use Jupyter Notebook/lab connection from the
compute node to the local browser
Gbd on all compute nodes
Ability to use Vim and compile codes on compute nodes (I wonder if
that means having the same env as the login node.)
Singularity (open source container platform like Docker, TACC has it)
Okular is no supported, use evince or xpdf.
The following base packages provided with Rocky Linux have been installed:
git-2.39
zsh-5.8
tmux- 3.2a
GNU 12.2 and GNU 13.1, OpenHPC builds as modules
Base OS for gdb installed, 10.2
apptainer-1.3 (formerly singularity)
Module¶
Lmod modules has been installed. Use the module commands to view available modules.
Intel compilers¶
Intel OneAPI compilers have been installed and are available via the module command.
Apptainer¶
Apptainer, formerly Singularity, has been installed on all the nodes using packages provided by Rocky Linux. Version is 1.3. There is no module needed for loading apptainer.
Note
Apptainer is not installed on the login node.
Create a job script, job.apptainer
#!/bin/bash
#SBATCH -J test # Job name
#SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH -N 2 # Total number of nodes requested
#SBATCH -n 16 # Total number of mpi tasks requested
#SBATCH -t 01:30:00 # Run time (hh:mm:ss) - 1.5 hours
# Launch apptainer job
/usr/bin/apptainer run docker://busybox uname -r
Yields
[stew@sverdrup]$ sbatch job.apptainer
Check the status of the log file
INFO: Using cached SIF image
5.14.0-362.24.1.el9_3.x86_64
Compute node access¶
A request was made to allow ssh access to the nodes without having to go through Slurm.
There is an advantage to use srun when reserving nodes. The environment is exported properly to the node(s) for submission. This is not the case for regular ssh sessions to nodes.
Warning
It is not recommended ssh’ing into the nodes unless you have a job running there you own and have queued via Slurm. It’s strongly recommended you use srun to reserve a node over ssh’ing into the node. The underlying queueing engine is not aware of ssh sessions and the node could be allocated by someone else or a job could be queued to a node that someone has ssh’d into. This could cause a conflict resources on the node.
Jupyter Notebook¶
Using a python virtual environment, installed jupyter-notbook into the home directory. Was able to connect to jupyter server over ssh to both the login node and to a compute node using a proxy jump. This should be the same as before.