Python programming language in CSC's Supercomputers Puhti and Mahti.
- Puhti: various 2.x and 3.x versions
- Mahti: 3.x versions
System Python is available by default both in Puhti and Mahti without loading
any module. Python 2 (= 2.7.5) is available as
python (= 2.7.5) (only Puhti), Python
3 (= 3.6.8) as
python3. The default system Python does not include any optional Python
packages. However, you can install simple packages for yourself by the methods
In Puhti there are several Python modules available that include different sets of scientific libraries:
- python-env - Anaconda Python with conda tools
- python-data - for data analytics and machine learning
- MXNet - MXNet deep learning framework
- PyTorch - PyTorch deep learning framework
- RAPIDS - for data analytics and machine learning on GPUs
- TensorFlow - TensorFlow deep learning framework
- JAX - Autograd and XLA for high-performance machine learning
- Bioconda - conda package manger for installing bioinformatics software
- BioPython - biopython and other bioinformatics related Python libraries
- geoconda - for spatial data anlysis
- and several other modules may include Python...
- python-data - for data analytics and machine learning
To use any of the above mentioned modules, just load the appropriate module, for example:
module load python-env
Typically, after activating the module, you can continue using the commands
python3 but these will now point to different versions of
Python with a wider set of Python packages available. For more details, check
the corresponding application documentation (when available).
Installing Python packages to existing modules
If you find that some package is missing from an existing module, you can often
install it yourself with:
pip install [newPythonPackageName] --user
The packages are by default installed to your home directory under
x.y is the version of Python being
used). If you would like to change the installation folder, for example to make
a project-wide installation instead of a personal one, you need to define the
PYTHONUSERBASE environment variable with the new installation local. For
example to add the package
whatshap to the
module load python-data export PYTHONUSERBASE=/projappl/<your_project>/my-python-env pip install --user whatshap
In the example, the package is now installed inside the
directory in the project's projappl directory. Run
unset PYTHONUSERBASE if you
wish to later install into your home directory again.
When later using those libraries you need to remember to add the
PYTHONPATH (or use the same
PYTHONUSERBASE definition as above).
Naturally, this also applies to slurm job scripts. For example:
module load python-data export PYTHONPATH=/projappl/<your_project>/my-python-env/lib/python3.9/site-packages/ python3 -c "import whatshap" # this should now work!
Note that if the package you installed also contains executable files these may not work as they refer to the Python path internal to the container (and most of our Python modules are installed with containers):
$ whatshap --help whatshap: /CSC_CONTAINER/miniconda/envs/env1/bin/python3.9: bad interpreter: No such file or directory
You can fix this by either editing the first line of the executable to point to
the real python interpreter (check with
which python3) or by running it via
the Python interpreter, for example:
$ python3 -m whatshap --help
Alternatively you can create a separate virtual environment with venv, however this approach doesn't work with modules installed with Singularity, which is now the default approach at CSC.
If you think that some important package should be included in a module provided by CSC, you can send an email to email@example.com.
Creating your own Python environments
It is also possible to create your own Python environments.
The easiest option is to use Tykky for conda or pip installations.
Custom Singularity container
In some cases, for example if you know of a suitable ready-made Singularity or Docker container, also using custom Singularity container is an option.
Please, see our Singularity documentation:
- Running Singularity containers
- Creating Singularity containers, including how to convert Docker container to Singularity container.
Conda is easy to use and flexible, but it might create a huge number of files which is inefficient with shared file systems. This can cause very slow library imports and in the worst case slowdowns in the whole file system. Therefore CSC has deprecated the use of Conda installations at CSC supercomputers.
- CSC conda tutorial describes in detail what conda is and how to use it. (Some parts of this tutorial may be helful also for Tykky installations.)
Python development environments
Python code can be edited with a console-based text editor directly on the supercomputer. Codes can also be edited on your local machine and copied to the supercomputer with scp or graphical file transfer tools. You can also edit Python scripts in Puhti from your local PC with some code editors like Visual Studio Code.
Finally, several graphical programming environments can be used directly on the supercomputer, such as Jupyter Notebooks and Spyder.
Jupyter Notebook allows one to run Python code via a web browser running on a local PC. The notebooks can combine code, equations, visualizations and narrative text in a single document. Many of our modules, including python-env, python-singularity, python-data, the deep learning modules and geoconda include Jupyter notebook package. See the tutorial how to set up and connect to a Jupyter Notebook for using Jupyter in CSC environment.
Python parallel jobs
Python has several different packages for parallel processing:
multiprocessing package is likely the easiest to use and as it is part of the
Python standard library it is included in all Python installations.
some more flexibility.
joblib are suitable for one
node (max 40 cores).
dask is the most versatile and has several options for
parallelization. Please see CSC's Dask tutorial
which includes both single-node (max 40 cores) and multi-node examples.
See our GitHub repository for some examples for using the different parallelization options with Puhti.
mpi4py is not included in the current Python environments in CSC supercomputers,
however, for multinode jobs with non-trivial parallelization it is generally the most
efficient option. For a short tutorial on
mpi4py along with other approaches to improve
performance of Python programs see the free online course
Python in High Performance Computing
Python packages usually are licensed under various free and open source licenses (FOSS). Python itself is licensed under the PSF License, which is also open source.
Last edited Fri Jun 10 2022