Skip to content

Table of Contents

ncu: GPU CUDA Kernel Profiler


Puhti: 2020.2.0.18
Mahti: 2020.3.1.0


NVIDIA Nsight Compute is a CUDA kernel profiler that provides detailed performance data and offers guidance for optimizing your CUDA kernels. The ncu profiling and debugging tool collects and views profiling data from the command-line. It is a low level CUDA kernel profiling tool. It enables the collection of a timeline of CUDA-related activities on both CPU and GPU, including kernel execution, memory transfers, memory set and CUDA API calls and events or metrics for CUDA kernels. Profiling results are displayed in the console after the profiling data is collected, and may also be saved for later viewing by ncu-ui tool.

To use ncu, one needs to first load a CUDA environment. First load the appropriate gcc module

module load gcc/9.1.0
on Puhti, or
module load gcc/10.3.0
on Mahti and then the CUDA and nsight-compute modules
module load cuda
module load nsight-compute

To profile a CUDA code, one then adds the command ncu before the normal command to execute the code. Running is otherwise similar to that of any other CUDA job on Puhti or Mahti.

An example of usage of ncu:

ncu --set full -o myreport ./a.out
Next the resulted report is analysed with ncu-ui on the CSC supercomputers or on the user's local machine. The performance of the program can be compared to the theoretical peak (speed-of-light) performance or to a custom baseline (for example a previous realease to be compared to) can be used.

ncu supports many useful running options, it is fully customizable. Use command line arguments --list metricsand --query-metrics to check the available metrics and enquire which metrics are available for the current platform. For more details please check the nvidia documentation.

Last edited Fri Aug 13 2021