Machine learning guide

This guide aims to help users who wish to do machine learning using CSC's computing resources.

Machine learning guide subsections

In addition to this page, this guide contains the following subsections:

What CSC service to use?

CSC offers several services which might be relevant for machine learning users:

Roihu and LUMI supercomputers are multi-user clusters and offer the highest computing performance, including GPU acceleration in a centrally controlled software environment,
Pouta offers your own virtual server with full control of the software environment, but restricted computing performance compared to supercomputers,
Rahti offers a more automatized container-based cloud environment, useful in particular for deploying web services.

Our recommendation is to use CSC's supercomputers, unless you need a very complicated software environment, or work with sensitive data. In those cases Pouta might be the right choice, and we also offer the ePouta variant which is suited for cases with high security requirements.

If you are developing a service, for example want to deploy a trained model as a service, then Pouta or Rahti might be most relevant for you.

If you are unsure about the right service to use, don't hesitate to contact our service desk and explain your computing needs.

CSC's supercomputers

For most machine learning needs CSC's supercomputers are the way to go. These are clusters of hundreds (or thousands) of computers, some of which offer GPU-acceleration. The supercomputers are multi-user systems, so individual users have limited rights to install software, and as with any shared resource one must follow the usage policy so that the service can remain usable.

CSC hosts the national supercomputer Roihu and the European LUMI supercomputer. If you are unsure which supercomputer to choose, read the discussion here.

If you are a new user, please read how to access Roihu, and how to submit computing jobs. If you have opted for LUMI read the LUMI Get Started page. LUMI users may also be interested in the AI Software Environment provided by the LUMI AI Factory.

All supercomputers also provide a web user interface, through which one can easily launch for example a Jupyter Notebook session with PyTorch. Note that GPUs access is more restricted for interactive sessions.

Also check the subsections related to efficient GPU utilization, how to work with big data sets, and multi-GPU and multi-node jobs.

Cloud services

Pouta

There are some use cases where the supercomputers are not the right solution, and you may need a virtual server on Pouta. Typical examples include:

very complex software environment,
need for root access,
computation involving sensitive data.

With Pouta you get your own virtual server, where you have root or administrator access. HPC and GPU flavors are available for heavy computing needs, however, the computing resources will always be smaller than in a supercomputer.

For computation involving highly sensitive data we also offer the ePouta variant which is suited for cases with high security requirements. With ePouta the virtual server will be integrated into your existing network infrastructure.

See our Pouta documentation pages on how to apply for access.

Machine learning guide