.. vim: syntax=rst .. include:: ../global.rst .. _linux-clusters: ======== Clusters ======== The Oden Institute has a number of small clusters that are owned by centers and are used only by those affiliated with the respective center. -------------- CRIOS sverdrup -------------- .. Note:: Sverdrup and its associated storage node (sverdrup-nas) both went under upgrades beginning late March 2024. System information: - OpenHPC cluster running Rocky Linux 9.3 (https://openhpc.community/) - $HOME is an NFS file system -> /home (2.0 TB) - /scratch is 105TB - /scratch2 is 125TB - /opt/apps/ is an NFS file system -> 100GB - Queuing system is SLURM 22.05. One queue available --> normal (35 Intel Omnipath nodes -> 28 cores/node, 64 GB/node, 980 cores) Node composition: - Dual-socket 14-core Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz (Haswell) - 64 GB of RAM per node - Omni-Path high performance communications HBA (100Gbs) - 10Gbs networking Wish list ^^^^^^^^^ Requests by research staff with the CRIOS group. CRIOS cluster is getting a major reboot from March 25 - April 5 and will be unreachable at that time. Below is a request list of software and features Patrick can pass to Oden RT, in no particular order: - git >= 2.31 - Allow all users to ssh into compute nodes used interactively - Update GNU compiler collection >= 12.2, but still need all the current GNU collection - tmux >= 2.7 - Keep z-shell as a shell option - Keep the feature to use Jupyter Notebook/lab connection from the - compute node to the local browser - Gbd on all compute nodes - Ability to use Vim and compile codes on compute nodes (I wonder if - that means having the same env as the login node.) - Singularity (open source container platform like Docker, TACC has it) - Okular is no supported, use evince or xpdf. The following base packages provided with Rocky Linux have been installed: - git-2.39 - zsh-5.8 - tmux- 3.2a - GNU 12.2 and GNU 13.1, OpenHPC builds as modules - Base OS for gdb installed, 10.2 - apptainer-1.3 (formerly singularity) Module ^^^^^^ Lmod modules has been installed. Use the `module` commands to view available modules. Intel compilers ^^^^^^^^^^^^^^^ Intel OneAPI compilers have been installed and are available via the module command. Apptainer ^^^^^^^^^ Apptainer, formerly Singularity, has been installed on all the nodes using packages provided by Rocky Linux. Version is 1.3. There is no module needed for loading apptainer. .. Note:: Apptainer is not installed on the login node. Create a job script, job.apptainer .. Code:: #!/bin/bash #SBATCH -J test # Job name #SBATCH -o job.%j.out # Name of stdout output file (%j expands to jobId) #SBATCH -N 2 # Total number of nodes requested #SBATCH -n 16 # Total number of mpi tasks requested #SBATCH -t 01:30:00 # Run time (hh:mm:ss) - 1.5 hours # Launch apptainer job /usr/bin/apptainer run docker://busybox uname -r Yields .. Code:: [stew@sverdrup]$ sbatch job.apptainer Check the status of the log file INFO: Using cached SIF image 5.14.0-362.24.1.el9_3.x86_64 Compute node access ^^^^^^^^^^^^^^^^^^^ A request was made to allow ssh access to the nodes without having to go through Slurm. There is an advantage to use srun when reserving nodes. The environment is exported properly to the node(s) for submission. This is not the case for regular ssh sessions to nodes. .. Warning:: It is not recommended ssh'ing into the nodes unless you have a job running there you own and have queued via Slurm. It's strongly recommended you use srun to reserve a node over ssh'ing into the node. The underlying queueing engine is not aware of ssh sessions and the node could be allocated by someone else or a job could be queued to a node that someone has ssh'd into. This could cause a conflict resources on the node. Jupyter Notebook ^^^^^^^^^^^^^^^^ Using a python virtual environment, installed jupyter-notbook into the home directory. Was able to connect to jupyter server over ssh to both the login node and to a compute node using a proxy jump. This should be the same as before.