Usage policy
Additional information
General Terms of Use for CSC's Services for Research and Education
Login nodes
When you login to CSC supercomputers, you end up on one of the login nodes of the cluster. These login nodes are shared by all users and they are not intended for heavy computing.
The login nodes should be used only for:
- compiling
- managing batch jobs
- moving data
- light pre- and postprocessing
Here light means one-core jobs that finish in minutes and require less than 1 GiB of memory at maximum. All other tasks are to be done in compute nodes either as normal batch jobs or as interactive batch jobs. Programs not adhering to these rules will be terminated without warning.
Important
The login nodes are not meant for long or heavy processes.
Disk cleaning
Each project has disk space in the directory /scratch/<project>
. This fast
parallel scratch space is intended for data that is in active use. To ensure
that the parallel disk system does not run out of storage space and to keep
performance acceptable,
CSC automatically removes files in Puhti scratch
that have not been accessed in a long time. The performance of a parallel file
system starts to degrade when it fills up, and the more it fills up, the slower
the performance will get.
This cleaning will happen regularly, and each time users are informed at least 1 month in advance. CSC also provides lists of files that are about to be removed and instructions for how one can transfer important files to more suitable disk systems.
The cleaning is stricter for projects with larger quotas:
- For projects that have a scratch quota of 5 TiB or more, files that have not been accessed (opened, read, modified) in the last 90 days will be deleted.
- For other projects with smaller scratch quotas, files that have not been accessed (opened, read, modified) in the last 180 days will be deleted.
You can use the csc-workspaces
command to see which cleaning cycle your
projects are subject to.
Mahti: A similar procedure will be introduced on Mahti if the disk usage grows enough to warrant it. The policy is still that users should keep only actively used data in scratch.
GPU nodes
Puhti and Mahti GPUs should only be used for workloads that greatly benefit
from GPU capacity compared to using CPUs or which can't be run on CPUs. In
particular AI/ML workloads are prioritized, since many of them cannot be done
at all on CPUs. A good rule of thumb is to compare the
Billing Unit (BU) usage (e.g. with
seff
)
of the job on GPUs against CPUs and select the one using less. One CPU BU and one
GPU BU are equal in terms of cost.
For Puhti and Mahti, this means that a full node of CPU cores roughly equals one GPU. However, since Puhti and Mahti have more CPU capacity than GPU, you might get access to CPUs with less queuing. Note that LUMI has a lot of GPU capacity which is also "cheaper" as measured in BUs, and on LUMI it's better to use GPUs if possible for your research. In any case, always make sure you use resources efficiently.
Conda installations
Due to performance issues of Conda-based environments on parallel file systems, CSC has deprecated the direct usage of Conda installations. This means that any Conda environments you intend to use must be installed within a container. See Conda best practices for more information.
Tykky
Please consider the Tykky container wrapper for easy containerization of Conda and pip environments.
Running out of Billing Units
When a project runs out of Billing Units, the ability to use the service will be limited in three phases. If you are still actively using the project you can lift the limitations by applying for more Billing Units.
In the first phase the ability to submit new jobs is limited:
- If you run out of Storage BUs, no new jobs can be submitted to any partition
- If you run out of CPU BUs, no new jobs can be submitted to CPU partitions
- If you run out of GPU BUs, no new jobs can be submitted to GPU partitions
In other words, running out of CPU or GPU BUs only affects the corresponding partition type, while Storage BUs affect all. Jobs that are running are not interrupted and will run until completion/timeout.
In the second step data access is limited. When you run out of storage
BUs a 30-day grace period starts, after which access to /projappl
and /scratch
folders is disabled. No data is deleted, it is only
access that is disabled. Data will, however, still be removed from
/scratch
during the normal cleaning process. Note that
having negative balance for CPU or GPU BUs does not trigger this step,
only a negative Storage BU balance.
If you are not using a project actively we encourage you to migrate any data that you still need within the 30-day grace period and then close the project in MyCSC.
In the third phase the project is closed after a 60-day grace period if you have run out of BUs of any type. If the project still has a negative amount of Billing Units of any type after 60 days, it will be closed.
Slurm job management by CSC
- CSC will not change job parameters like length or priority.
- CSC can terminate jobs if they are misusing resources. E.g., if resources (CPU cores, GPUs, memory) are severely underutilized or IO is overloading the storage system.