Roihu data migration guide
About this guide
This guide is divided into four parts:
- General guidelines and prerequisites
- Recommended data migration methods
- Special cases
- Discouraged methods
Please read the General guidelines and prerequisites section before migrating any data to Roihu. If your data migration needs are small and simple, checking the Basic rsync example may suffice. If you have a lot of data or other special requirements, please also read the other sections carefully.
Mahti and Puhti shutdown in Fall 2026
Mahti and Puhti are being decommissioned by October 2026.
- Puhti computing services will be shut down 31 July 2026 at 12:00 EEST.
- Mahti computing services will be shut down 31 August 2026 at 12:00 EEST.
- Puhti and Mahti storage and login nodes are planned to remain accessible until 15 October 2026 at 12:00 EEST.
Puhti and Mahti storage services will be decommissioned 15 October 2026 at 12:00 EEST, but are not covered by service contracts after end of August. Due to this, aim to complete your data transfers from Mahti and/or Puhti by the end of August 2026, or by October 15th at the very latest.
1. General guidelines and prerequisites
1.1 Review and clean up your data before migration
-
Roihu scratch disk is not intended for long-term data storage, but should only be used for data that is in active use. Thus, only move data that you truly need.
- Good data hygiene reduces transfer time and load on the file system, as well as eliminates the risk of moving redundant or duplicate data. Roihu will implement a similar disk cleaning policy as Puhti, meaning that files that have not been accessed in 180 days will be deleted.
-
We recommend using the LUE tool to identify where you have lots of data. Avoid using tools such as
duas they may cause a lot of load on the file system. Simple usage example (runlue -hfor other options):
-
Other tips:
- Remove or exclude temporary files (cached data, intermediate results, logs, unnecessary checkpoint files, core dumps, etc.).
- Apptainer containers built for CPUs can be moved to Roihu. Do not move containers targeting GPUs or any native installations from Puhti or Mahti to Roihu. These must be re-built from scratch for best performance, or in order for them to work at all (GPU nodes have ARM CPU architecture). More about installing software on Roihu.
1.2 Ensure that you have enough disk space on Roihu
-
Once you have identified the data you need to transfer, check that it fits within the default disk quotas on Roihu:
Disk area Path Default size Max. size Default file number limit Max. file number limit Home /users/$USER15 GiB 15 GiB 150k 150k ProjAppl /projappl/<project>15 GiB 250 GiB 150k 2.5M Dataset /dataset/<project>0 GiB case-by-case 0 case-by-case Scratch /scratch/<project>250 GiB 100 TiB 500k 10M -
Please note that existing quota extensions on Puhti/Mahti will not automatically carry over to Roihu, so you must separately apply for increased disk quota via MyCSC beforehand if your data does not fit within the default limits.
New Dataset disk area on Roihu
Users may apply for new "dataset projects" (cf. regular computing projects) to get access to a new disk area on Roihu – Dataset. This disk area allows storing datasets on the disk for a longer time (no cleaning, lifetime is limited by the data project lifetime). Read access to the data can be shared globally, or with specific project IDs.
Dataset projects and Dataset quotas are applied for and managed via MyCSC. Dataset quota consumes Storage BUs.
1.3. Add Roihu service access to your CSC project
- Like any other CSC service, access to Roihu must be enabled for your project via MyCSC.
- Note also that users must have at least a medium level of identity assurance (LoA) to be able to access Roihu. You can check your LoA on your profile page in MyCSC, and elevate it if needed following these instructions.
1.4 Transfer your data directly from Puhti/Mahti to Roihu
- It is not recommended to transfer data to Roihu via Allas or your local
workstation. Instead, CSC recommends using command-line tools such as
rsyncto directly transfer data from Puhti/Mahti/LUMI to Roihu.
Extremely important
1.5 Connecting to Roihu requires SSH certificates
- In addition to SSH keys, a signed SSH certificate is required to connect to Roihu over SSH. Read the instructions for getting and using SSH certificates here.
- To transfer data directly from Puhti/Mahti to Roihu, you must forward your SSH agent when connecting to the system where you launch the data transfer process.
2. Recommended data migration methods
rsyncis the preferred tool for transferring data from Puhti or Mahti to Roihu. Read more aboutrsynchere.- We will use Puhti as an example, but the exact same steps apply to Mahti and LUMI
as well. Simply replace all occurrences of
puhtiin host names etc. withmahti. - All examples require that you've forwarded your SSH agent including your SSH keys and a valid SSH certificate to Puhti when connecting.
- Before starting the data transfer, ensure that the target directory on Roihu exists and is writable.
Help! What to do if I struggle to add my SSH certificate to the SSH agent?
Alternatively, you may log in to Roihu and pull data from Puhti. Because connecting to Puhti does not require an SSH certificate, it is enough that the forwarded SSH agent holds your SSH keys.
Note that you will still need a valid SSH certificate when connecting to Roihu in the first place, but it does not have to be added to your SSH agent.
2.1 Basic rsync
- Obtain an SSH certificate.
- Add your SSH keys and certificate to your SSH agent.
- Log in to Puhti with SSH agent forwarding turned on.
-
On the login node, transfer directory
/scratch/project_2001234/my-datafrom Puhti to directory/scratch/project_2001234/on Roihu.Option Description -aUse archive mode: copy files and directories recursively, preserve access permissions, timestamps and symbolic links. -PKeep partially transferred files and show progress during transfer. -
Alternatively, if you've connected to Roihu and are pulling data from Puhti, use the command:
The rsync -aP command is suitable if:
- The number of files to transfer is small (<1000) or the files are
large enough (>1 MB on average).
- If not, please archive and, optionally, compress the data before transfer.
- You are transferring your own files or resulting file ownership on Roihu
does not matter.
- You will own all files that you transfer to Roihu irrespective of who the owner on Puhti is.
Note! The trailing / character has a meaning in rsync commands!
A trailing / character affects what gets transferred from the source.
If the source path ends with /, then all contents of the directory will
get copied, but not the directory itself. To transfer the directory itself
(and the contents), leave out the trailing / as in the previous example.
How long will my data migration take?
The table below can be used as a rough reference for how long certain data
transfers using rsync will take.
| Number of files | Average file size | Total size | Duration | Notes |
|---|---|---|---|---|
| 1 | 1 GB | 1 GB | 6 s | |
| 10 | 100 MB | 1 GB | 6 s | |
| 100 | 10 MB | 1 GB | 6 s | |
| 1000 | 1 MB | 1 GB | 11 s | Small-file overhead increases, please archive! |
| 10000 | 100 kB | 1 GB | 45 s | Small-file overhead increases, please archive! |
| 1 | 10 GB | 10 GB | ~1 min | |
| 10 | 1 GB | 10 GB | ~1 min | |
| 100 | 100 MB | 10 GB | ~1 min | |
| 1000 | 10 MB | 10 GB | ~1 min | |
| 1 | 100 GB | 100 GB | ~11 min | |
| 1 | 1 TB | 1 TB | ~ 2 h |
Please note that the actual performance may vary based on the current system load. If you need to transfer thousands of small files (<1 MB), pack them into a single archive file for better performance.
2.2 Performing a dry run
It may be useful to perform a dry run before starting the actual rsync
process. Add the option -n to your rsync command:
This command does not transfer anything, it simply shows what would happen if
you were to run rsync without the -n option.
Note! A dry run will not catch errors that would be caused by insufficient permissions
An rsync dry run will not catch errors caused by insufficient
permissions. In other words, it assumes that:
- You have read and execute permissions for all files and directories, respectively, that you are trying to migrate from Puhti.
- You have write permission on the destination (Roihu).
To list files and directories that you cannot transfer due to insufficient permissions, try:
To check if the destination is writable, try:
ssh $USER@roihu-cpu.csc.fi "touch /scratch/project_2001234/.test && rm /scratch/project_2001234/.test"
Missing write permissions will cause a Permission denied error. If the
destination does not exist, you will get a No such file or directory
error.
2.3 Migrating data with large amounts of small files
If the data you need to migrate contains thousands of small files, it is recommended to archive the data before transferring it, i.e. pack all files into a single file. Most data transfer tools handle one large file far better than thousands of small ones.
-
Assuming you want to migrate the directory
/scratch/project_2001234/my-datafrom Puhti to Roihu, create (c) an archive of it as follows: -
Transfer the archived dataset
my-data.tarto Roihu usingrsync. -
Extract (
x) the data on Roihu with:
Mind your disk quota!
Archiving creates new data on the disk. If your dataset is large, you may end up running out of disk quota since the operation will essentially double your disk usage (unless the archive is also compressed).
A trick to avoid creating new data on Puhti disk is to pipe the output of
tar to Roihu directly over SSH. Use the command:
3. Special cases
3.1 Data compression
Data compression can be useful to save storage space and make data transfer faster, but it may take a lot of time. Data compression is CPU intensive and compressing large datasets may easily take several hours.
The compressibility of files depends on their content. Certain file formats are already highly compressed (e.g., images) and trying to compress these further is counter-productive. On the other hand, data compression can be beneficial if transferring, for example, many small plain text files, or large text-based datasets.
| File types that compress well | File types that do not compress well |
|---|---|
| Plain text | Media (JPG, PNG, GIF, MP3, MP4, WAV, etc.) |
| CSV, XML, JSON, YAML, etc. | Pre-compressed archives (ZIP, gzip, etc.) |
| Source code (Python, C, etc.) | Binary blobs (e.g., compiled executables) |
rsync provides built-in functionality for on-the-fly compression and
decompression via the -z option:
Alternative methods to maximize performance
rsync uses the zlib library for compressing data during transfer. The
performance is comparable to gzip, but there are even faster options
available if needed. One such is
zstd compression.
zstd compression can be combined with using tar over SSH. To transfer
the directory my-data from Puhti to Roihu, run:
tar c -I zstd -C /scratch/project_2001234 my-data | ssh $USER@roihu-cpu.csc.fi 'cat > /scratch/project_2001234/my-data.tar.zst'
In cases where compression is not beneficial, you can also use plain tar
over ssh
as explained previously.
The performance can be better than rsync, especially if your dataset
contains a huge number of tiny files.
3.2 Running long transfer processes safely
One of the strengths of rsync is that interrupted transfers can be easily
resumed – just run the same rsync command again. rsync will compare the
source and destination, skip already transferred files (copies only what's
missing) and resume partially transferred files (as long as option -P or
--partial is used as instructed above).
However, to avoid failures caused by interrupted SSH sessions altogether, you
may run your data migration process in a screen session.
-
On Puhti, start a
screensession: -
Start your
rsynccommand inside thescreensession: -
Now you may detach and leave the
rsyncprocess running:The data migration process will keep running safely in the background. You may log out from Puhti if you want.
-
Reattach the session with:
If you forgot the name of the session, try
screen -ls. -
When the data transfer has finished, terminate the session by typing
exitinside thescreensession.
Using screen is useful if your data transfer will take several hours. You
can, for example, power off your computer and leave the rsync process running
overnight.
Use tmux in Roihu
The screen command is not available on Roihu, but you can
do all data transfers to Roihu from the Mahti and/or Puhti login.
If you need to have a longer terminal
session open on Roihu, use tmux instead.
See instructions for tmux in the Roihu tmux tutorial.
3.3 Using checksums to verify data integrity
rsync ensures data integrity using internal checksum mechanisms by default.
It is therefore not necessary to verify data integrity separately.
If you're not using rsync, you may calculate a checksum for files using e.g.
md5sum.
-
Assuming you've got a dataset archive
data.taron Puhti, calculate a checksum for it with:Note that calculating checksums for huge datasets can take some time, especially if the current disk load is high.
-
Transfer the dataset and the
data.tar.md5checksum file to Roihu. -
With the
data.taranddata.tar.md5files in the same directory, verify the checksum with:If any byte changed during transfer, the file will not match, and you will see
data.tar: FAILED. Otherwise you should getdata.tar: OK.
3.4 If file ownership matters
You will be set as the owner of all files that you transfer from Puhti to Roihu. This is important to realize when migrating data from shared project directories where you may have read access to data owned by your colleagues.
There is no way for users to move their colleagues' data in such a way that the ownerships would be preserved. If this is important to your project, then please ensure that each member moves only their own data.
In case you later notice incorrect file ownerships, Roihu system administrators may fix them for you. Please contact CSC Service Desk with the details on which files and/or directories are affected and who should be set as the owner.
4. Discouraged methods
4.1 scp
scp has many drawbacks compared to rsync. It cannot resume interrupted
transfers, has limited metadata preservation capabilities, no built-in
integrity checks and inferior performance. Using it to migrate data to Roihu is
therefore not recommended, unless your dataset is very small and simple (<10
GB, <100 files).
4.2 Using the web interfaces to migrate data
Unfortunately, there is no good way for using the Puhti or Mahti web interfaces to move data directly to Roihu. There are some indirect ways, but none of them are efficient, which is why we primarily recommend the command-line based approaches above. The following options should therefore be considered as "last resort" choices.
- Use the Puhti/Mahti web interface file browser to first download your data locally, and then upload it to Roihu via the Roihu web interface. Note that there is a limit of 10 GB for individual file uploads, so data larger than this must be split into suitable chunks. Alternatively, you could use graphical file transfer utilities to upload the data to Roihu since you've already downloaded it locally.
- If you have a LUMI project, you could use the Puhti/Mahti web interface Cloud storage configuration app to set up a connection to LUMI-O, upload your data there, and then fetch it from LUMI-O to Roihu.
Don't use Allas for migrating data to Roihu!
It is strongly discouraged to use Allas for migrating data to Roihu because Allas is running out of capacity. Please prefer LUMI-O if you must migrate data to Roihu via object storage.