Python programming language in CSC's Supercomputers Puhti and Mahti.
- Puhti: various 2.x and 3.x versions
- Mahti: various 2.x and 3.x versions
System Python is available by default both in Puhti and Mahti without loading
any module. Python 2 (= 2.7.5) is available as
python (= 2.7.5) Python
3 (= 3.6.8) as
python3. The default system Python does not include any optional Python
packages. However, you can install simple packages for yourself by the methods
In Puhti there are several Python modules available that include different sets of scientific libraries:
- python-env - Anaconda Python with conda tools
- python-data - for data analytics and machine learning
- MXNet - MXNet deep learning framework
- PyTorch - PyTorch deep learning framework
- RAPIDS - for data analytics and machine learning on GPUs
- TensorFlow - TensorFlow deep learning framework
- Bioconda - conda package manger for installing bioinformatics software
- BioPython - biopython and other bioinformatics related Python libraries
- geoconda - for spatial data anlysis
- Solaris - deep learning pipeline for geospatial imagery
- and several other modules may include Python...
- python-env - anaconda Python with conda tools
- python-singularity - Singularity-based Python
To use any of the above mentioned modules, just load the appropriate module, for example:
module load python-env
Typically, after activating the module, you can continue using the commands
python3 but these will now point to different versions of
Python with a wider set of Python packages available. For more details, check
the corresponding application documentation (when available).
Installing Python packages to existing modules
If you find that some package is missing from an existing module, you can often
install it yourself with:
pip install [newPythonPackageName] --user
The packages are by default installed to your home directory under
x.y is the version of Python being
used). If you would like to change the installation folder, for example to make
a project-wide installation instead of a personal one, you need to define the
PYTHONUSERBASE environment variable with the new installation local. For
When later using those libraries you need to remember to add that path to
PYTHONPATH or use the same
PYTHONUSERBASE definition as above. Naturally,
this also applies to slurm job scripts.
Alternatively you can create a separate virtual environment with venv, for example:
module load python-data python -m venv --system-site-packages my-venv source my-venv/bin/activate pip install my_package_to_install
venv, you can keep separate environments for each program. The next time
you wish to activate the environment you only need to run
venv does not work with Python modules installed with Singularity.
If you think that some important package should be included in a module provided by CSC, you can send an email to firstname.lastname@example.org.
Creating your own Python environments
It is also possible to create your own Python environments. The main options are conda and Singularity. Singularity should be preferred at least when you know of a suitable ready-made Singularity or Docker container. Conda is easy to use and flexible, but it might create a huge number of files which is inefficient with shared file systems. This can cause very slow library imports and in the worst case slowdowns in the whole file system.
Please, see our Singularity documentation:
- Running Singularity containers
- Creating Singularity containers, including how to convert Docker container to Singularity container.
- CSC conda tutorial describes in detail what conda is and how to use it.
- Bioconda provides conda tools preinstalled.
Python development environments
Python code can be edited with a console-based text editor directly on the supercomputer. Codes can also be edited on your local machine and copied to the supercomputer with scp or graphical file transfer tools. You can also edit Python scripts in Puhti from your local PC with some code editors like Visual Studio Code.
Finally, several graphical programming environments can be used directly on the supercomputer, such as Jupyter Notebooks and Spyder.
Jupyter Notebook allows one to run Python code via a web browser running on a local PC. The notebooks can combine code, equations, visualizations and narrative text in a single document. Many of our modules, including python-env, python-singularity, python-data, the deep learning modules and geoconda include Jupyter notebook package. See the tutorial how to set up and connect to a Jupyter Notebook for using Jupyter in CSC environment.
Python parallel jobs
Python has several different packages for parallel processing:
multiprocessing package is likely the easiest to use and as it is part of the
Python standard library it is included in all Python installations.
some more flexibility.
joblib are suitable for one
node (max 40 cores).
dask is the most versatile and has several options for
parallelization. Please see CSC's Dask tutorial
which includes both single-node (max 40 cores) and multi-node examples.
See our GitHub repository for some examples for using the different parallelization options with Puhti.
mpi4py is not included in the current Python environments in CSC supercomputers,
however, for multinode jobs with non-trivial parallelization it is generally the most
efficient option. For a short tutorial on
mpi4py along with other approaches to improve
performance of Python programs see the free online course
Python in High Performance Computing
Python packages usually are licensed under various free and open source licenses (FOSS). Python itself is licensed under the PSF License, which is also open source.
Last edited Wed May 5 2021