-
Compiling on Roihu
Compiling applications in Roihu
Info
Roihu has separate CPU and GPU partitions with different CPU architectures:
- Roihu-CPU nodes use AMD (x86) processors
- Roihu-GPU nodes use NVIDIA Grace (ARM) processors
Binaries compiled for one architecture are not usable on the other. Accordingly, software should be compiled on the same side where it will be run:
- Compile for CPU nodes on the Roihu-CPU login node
- Compile for GPU nodes on the Roihu-GPU login node
General instructions
-
Whenever possible, use the local disk on the login node for compiling software.
- Compiling on the local disk is much faster and shifts load from the shared file system.
- The local disk is cleaned frequently, so please move your files elsewhere after compiling.
-
Please see the page on available HPC libraries for using common libraries (BLAS, FFTW, ...) and linking them to your applications.
Compiling on Roihu-CPU
Info
When compiling for the CPU nodes on Roihu, make sure you use Roihu's CPU login nodes. Binaries compiled on Roihu-GPU are not compatible with Roihu-CPU nodes.
Roihu-CPU provides GNU and AMD AOCC compiler environments for building C/C++ and Fortran applications. These environments are available under the following modules:
| Compiler suite | Modules |
|---|---|
| GNU 15.2.0 | gcc/15.2.0 openmpi/5.0.10 |
| AMD AOCC 5.0.0 | aocc/5.0.0 openmpi/5.0.10 |
The first compiler suite is loaded by default. You can change the environment by loading the listed modules, for example,
List all available versions of the compiler suites:
The compiler executables are as follows:
| Compiler suite | C | C++ | Fortran |
|---|---|---|---|
| GNU | gcc | g++ | gfortran |
| AMD | clang | clang++ | flang |
For applications that depend on MPI, it is recommended to instead use the compiler wrappers described in the MPI section below.
The compiler options for different suites are different. The recommended basic optimization flags are listed in the table below. It is recommended to start from the safe level and then move up to intermediate or even aggressive, while ensuring the results are correct and the program's performance has improved.
| Optimization level | GNU | AMD (clang) |
|---|---|---|
| Safe | -O2 -march=znver5 | -O2 -march=znver5 |
| Intermediate | -O3 -march=znver5 | -O3 -march=znver5 |
| Aggressive | -O3 -march=znver5 -ffast-math -funroll-loops |
Example of compiling a non-MPI C program in GNU environment:
A detailed list of options for the GNU and AMD compilers can be found in the man
pages (man gcc/gfortran) when the corresponding programming
environment is loaded, or in the compiler manuals:
- GNU
- AMD AOCC
We recommend testing and profiling your application with both compiler suites to see which compiler works the best for your use case.
Building MPI applications
The MPI environment in Roihu is OpenMPI. You may use one of the MPI compiler wrappers
mpicc (C), mpicxx (C++), or mpif90 (Fortran) when compiling MPI applications.
These wrappers end up calling the compiler from your currently loaded compiler suite
(GNU or AMD) and work in both compiler suites.
Example:
List all available versions of OpenMPI (one is always loaded by default):
Building OpenMP and hybrid applications
An additional compiler and linker flag is needed when building an OpenMP or a hybrid MPI+OpenMP application:
| Compiler suite | OpenMP flag |
|---|---|
| GNU and AMD | -fopenmp |
Example compilation of a hybrid MPI+OpenMP application:
Compiling on Roihu-GPU
Info
When compiling for the GPU nodes on Roihu, make sure you use Roihu's GPU login nodes. Binaries compiled on Roihu-CPU are not compatible with Roihu-GPU nodes.
Roihu-GPU provides GNU and NVIDIA-HPC compiler environments for building C/C++ and Fortran applications under the following modules:
| Compiler suite | Modules |
|---|---|
| GNU 14.3.0 + CUDA 12.9.1 | gcc/14.3.0 cuda/12.9.1 openmpi/5.0.8 openblas/0.3.30 |
| GNU 15.2.0 + CUDA 13.1.1 | gcc/15.2.0 cuda/13.1.1 openmpi/5.0.8 openblas/0.3.30 |
| NVIDIA HPC 26.3 | nvhpc/26.3 |
The first compiler suite is loaded by default. You can change the environment by loading the listed modules, for example,
About the nvhpc module
Note that the nvhpc module includes CUDA, MPI, and BLAS implementations,
so you don't need to load these modules separately when using the nvhpc module.
For this reason, the module load might note you about inactive modules.
To avoid leaving inactive modules, you can purge modules before loading the environment:
List all available versions of the compiler suites:
The compiler executables are as follows:
| Compiler suite | C | C++ | Fortran |
|---|---|---|---|
| GNU | gcc | g++ | gfortran |
| NVIDIA HPC | nvc | nvc++ | nvfortran |
In addition, the CUDA nvcc compiler is available for building GPU kernel code. See the CUDA section below.
Compiling CUDA applications
CUDA is the recommended programming model for Nvidia GPUs and it is
provided in cuda and nvhpc modules.
The CUDA compiler (nvcc) takes care of compiling CUDA kernels code for the target
GPU device and passes the rest to the currently loaded host compiler like gcc or nvhpc.
To generate code for a given target device, tell the CUDA
compiler what compute capability the target device supports. On Roihu, the
GPUs (Hopper 200) support compute capability 9.0. Specify this using
-gencode arch=compute_90,code=sm_90. Alternatively, you may use compute_90a
or sm_90a to enable Hopper-specific extension features that may produce more
performant code.
For example, compiling a CUDA kernel (example.cu) on Roihu:
Info
Code generated with arch=compute_90a or code=sm_90a is not backwards or forwards
compatible with other GPU architectures. If this is a concern for you, use the
more generic arch=compute_90,code=sm_90 options.
Compiling MPI+CUDA applications
All the provided GNU and NVIDIA compiler environments provide a CUDA-aware MPI library.
If the structure of the MPI+CUDA application allows, you can build it in parts:
- Compile CUDA kernels to object files with
nvcc -c - Compile host code to object files with the MPI compiler wrappers that will call the loaded host compiler (
mpicc -c,mpicxx -c, ormpif90 -c) - Link all the object files with the MPI compiler wrapper (
mpicc,mpicxx, ormpif90)
It is also possible to compile the whole codebase with nvcc, but then
we need to provide the necessary MPI compile and link options
to the underlying host compiler called by nvcc.
This can be achieved as follows via -Xcompiler and -Xlinker flags:
# Parse MPI options for compiler
Xcompiler="-Xcompiler $(mpicxx --showme | tr ' ' '\n' | sed '/^-Wl,/d;1d' | paste -sd, -)"
# Parse MPI options for linker
Xlinker="-Xlinker $(mpicxx --showme | tr ' ' '\n' | sed -n 's/^-Wl,//p' | paste -sd, -)"
# Compile MPI code using nvcc
nvcc -gencode arch=compute_90a,code=sm_90a $Xcompiler $Xlinker mpi_cuda_code.cu
Warning
Remember to load the modules used for compiling also when running the application to ensure that the correct MPI library is used during the runtime.
Compiling application using OpenMP offload, OpenACC, and C++ standard parallelism
Warning
It is recommended to use the NVIDIA HPC compilers for compiling codes using OpenMP offload, OpenACC, and C++ standard parallelism.
Start by loading NVIDIA HPC compilers:
The compiler options for enabling different GPU programming models are as follows:
| Programming model | Compiler option |
|---|---|
| OpenMP offload | -mp=gpu |
| OpenACC | -acc=gpu |
| C++ stdpar | -stdpar=gpu (nvc++ only) |
To generate efficient code for the GH200 superchips on Roihu, specify the target with the following option:
Example compilation commands:
| Programming model | C | C++ | Fortran |
|---|---|---|---|
| OpenMP offload | nvc -O3 -mp=gpu -gpu=cc90 example.c |
nvc++ -O3 -mp=gpu -gpu=cc90 example.cpp |
nvfortran -O3 -mp=gpu -gpu=cc90 example.F90 |
| OpenACC | nvc -O3 -acc=gpu -gpu=cc90 example.c |
nvc++ -O3 -acc=gpu -gpu=cc90 example.cpp |
nvfortran -O3 -acc=gpu -gpu=cc90 example.F90 |
| C++ stdpar | N/A | nvc++ -O3 -stdpar=gpu -gpu=cc90 example.cpp |
N/A |
The compilers support also codes that contain multiple programming models. As an example, compile a C++ code that contains OpenMP offload, OpenACC, and C++ parallel algorithms with:
Compiling MPI application using OpenMP offload, OpenACC, and C++ standard parallelism
The nvhpc module is bundled with GPU-aware MPI implementation with
the usual compiler wrappers, and MPI applications can be compiled
like above but replacing nvc, nvc++, and nvfortran with
mpicc, mpicxx, and mpif90, respectively:
| Programming model | C | C++ | Fortran |
|---|---|---|---|
| OpenMP offload | mpicc -O3 -mp=gpu -gpu=cc90 example.c |
mpicxx -O3 -mp=gpu -gpu=cc90 example.cpp |
mpif90 -O3 -mp=gpu -gpu=cc90 example.F90 |
| OpenACC | mpicc -O3 -acc=gpu -gpu=cc90 example.c |
mpicxx -O3 -acc=gpu -gpu=cc90 example.cpp |
mpif90 -O3 -acc=gpu -gpu=cc90 example.F90 |
| C++ stdpar | N/A | mpicxx -O3 -stdpar=gpu -gpu=cc90 example.cpp |
N/A |