Scalasca is an open-source software tool that supports the performance optimization of parallel programs by measuring and analyzing their runtime behavior via event traces. The analysis identifies potential performance bottlenecks - in particular those concerning communication and synchronization - and offers guidance in exploring their causes. Scalasca supports applications using MPI, OpenMP, POSIX threads, or MPI+OpenMP/Pthreads parallelization.
- Puhti: 2.6
- Mahti: 2.6
Usage is possible for both academic and commercial purposes.
Using Scalasca involves three steps:
- Execution measurement collection & analysis
- Analysis report examination
Scalasca uses the Score-P measurement infrastructure for instrumentation of the target application. Score-P can be used also as a stand-alone tool without Scalasca.
In order to instrument an application, you need to recompile the
application using the Score-P instrumentation command
scorep, which is added
as a prefix to the original compile and link commands:
module load scorep scorep mpicc -o my_prog my_prog.c
C/C++codes or similarly
Measurement collection and analysis
While applications instrumented by Score-P can be executed directly
with a measurement configuration defined via environment variables,
scan convenience command provided by
Scalasca can be used to control certain aspects of the Score-P
measurement environment during the execution of the target
application. To produce a performance measurement using an
instrumented executable, prefix
scan command in the batch job script:
... #SBATCH --ntasks=40 module load scalasca scan srun ./my_app
One can also collect event trace data. As tracing can produce huge
amounts of data, it is recommended to first estimate the size of
trace, and possibly filter out some functions from the measurement.
Estimate can be obtained with
scorep-score -r scorep_my_app_40_sum/profile.cubex Estimated aggregate size of event trace: 1022kB Estimated requirements for largest trace buffer (max_buf): 129kB Estimated memory requirements (SCOREP_TOTAL_MEMORY): 4097kB (hint: When tracing set SCOREP_TOTAL_MEMORY=4097kB to avoid intermediate flushes or reduce requirements using USR regions filters.) flt type max_buf[B] visits time[s] time[%] time/visit[us] region ALL 131,431 20,196 12.81 100.0 634.28 ALL MPI 95,054 8,076 8.65 67.5 1071.04 MPI USR 24,168 8,056 3.38 26.3 418.96 USR COM 12,168 4,056 0.78 6.1 193.47 COM SCOREP 41 8 0.00 0.0 48.00 SCOREP MPI 94,000 8,000 0.17 1.3 20.97 MPI_Sendrecv USR 12,000 4,000 0.00 0.0 0.25 swap_fields COM 12,000 4,000 0.00 0.0 0.53 exchange USR 12,000 4,000 3.33 26.0 832.89 evolve MPI 826 14 0.01 0.1 823.21 MPI_Recv ...
In order to filter out the measurement of
one can create a file
scorep.filter with the contents:
SCOREP_REGION_NAMES_BEGIN EXCLUDE swap_fields evolve SCOREP_REGION_NAMES_END
and check the effect of filtering with
scorep-score -f scorep.filter -r scorep_my_app_40_sum/profile.cubex Estimated aggregate size of event trace: 835kB Estimated requirements for largest trace buffer (max_buf): 105kB ...
One could now proceed with the trace collection by setting
SCOREP_FILTERING_FILE environment variable and by passing options
... #SBATCH --ntasks=40 module load scalasca export SCOREP_FILTERING_FILE=scorep.filter scan -q -t srun ./my_app
After the trace collection is finished, Scalasca will carry out
trace-analysis for identifying various performance bottlenecks.
With tracing enabled, the experiment directory would be
Analysis report examination
The Scalasca analysis report explorer
square cannot currently be run in CSC
supercomputers. However, user may install Scalasca on their local
workstation, and copy the experiment directory there for analysis,
rsync -r puhti.csc.fi:/scratch/.../rundir/scorep_my_app_40_trace . square scorep_my_app_40_trace
The OTF2 formatted event trace
be analyzed also with Intel Trace Analyzer.