Preparing data for Roihu: Data management and temporary storage options
Roihu schedule
Roihu is not yet available for use, and cannot be added as a service yet in MyCSC.
The target for Roihu general availability is end of June 2026.
Mahti and Puhti storage services will shut down end of August 2026.
This tutorial helps you prepare your data for the transition from Mahti and Puhti to Roihu. The main recommendation is to plan the migration in advance, review what data you need to keep, and transfer actively used data directly from Mahti or Puhti to Roihu after Roihu is available.
Direct transfer from Mahti/Puhti to Roihu
Preliminary instructions for direct data transfer from Mahti or Puhti to Roihu are available in the Roihu data migration guide.
Use the direct transfer guide as the primary migration instructions once Roihu is available.
If your research group cannot wait until Roihu is available, you might consider utilizing Allas or LUMI-O as a temporary storage service to host your data.
Using Allas or LUMI-O might be applicable, for example, if key project members will be unavailable during the period between Roihu general availability and the shutdown of Mahti and Puhti storage servers.
Another possible reason for utilizing Allas or LUMI-O temporarily is that Roihu's default storage quotas are smaller than on Mahti and Puhti. If your data does not fit within the default quotas, first review and clean up your data, and then consider whether you need a quota increase on Roihu for your project. See CSC documentation for applying for more disk quota.
The default disk quotas on Roihu are:
| Capacity | Number of files | |
|---|---|---|
| home | 15 GiB | 150 000 files |
| projappl | 15 GiB | 150 000 files |
| scratch | 250 GiB | 500 000 files |
The recommended temporary storage options are:
- A: Allas, if your CSC project has enough available Allas quota
- B: LUMI-O, if you have a project that has access to LUMI-O, and enough available storage quota there
After Roihu becomes available, you can copy the data from Allas or LUMI-O to Roihu.
Only use Allas or LUMI-O if strictly needed
Your primary approach in data migration from Mahti and Puhti to Roihu should be a direct copy from one machine to the other. Use Allas or LUMI-O only if you cannot complete the data transfer during the period between Roihu availability and the Mahti/Puhti storage shutdown.
Limited capacity in Allas
Allas is running out of capacity. Use Allas only if your project has existing quota there. Do not apply for new Allas quota in your project if you need object storage. Instead apply for LUMI-O access.
See the short talk below for instructions on applying for a LUMI-O project.
See a CSC short talk for how to use object storage and how to apply for a LUMI-O project, if needed:
Recommended migration plan
- Review and clean up your data now. Decide what must be preserved, what can be deleted, and what should be rebuilt or regenerated on Roihu.
- Plan where the data should go on Roihu. Only data that you are actively processing should be moved directly to Roihu's working disk areas.
- Transfer data directly from Mahti or Puhti to Roihu when Roihu is available. A detailed guide for direct transfers will be published after Roihu is available, though you can already investigate the WIP version of the guide.
- Use Allas or LUMI-O only if direct transfer is not possible in time. These services can be used as temporary storage if you cannot complete the migration between Roihu general availability and the Mahti/Puhti storage shutdown.
- Verify the copied data before deleting anything. Keep the original data on Mahti or Puhti until the migration has been fully completed and verified.
When should you use the Allas or LUMI-O instructions in this tutorial?
Use the Allas or LUMI-O instructions if:
- you have data on Mahti or Puhti that must be preserved before the storage servers are shut down (end of August 2026)
- you cannot wait until Roihu is generally available before starting the transfer (end of June 2026), or you cannot complete a direct transfer to Roihu before the Mahti/Puhti storage shutdown
- you need temporary object storage for the transition period
Do not use the Allas or LUMI-O instructions to automatically move everything to Allas or LUMI-O. Before transferring data, review what you actually need to keep.
Before you start
1. Clean up your data
Only move data that you still need. In particular, avoid transferring:
- temporary files
- cache directories
- intermediate results that can be recomputed
- old log files
- unnecessary checkpoint files
- duplicate datasets
- software installations and executables that should be rebuilt on Roihu
For large directories, check the data volume before transferring. On CSC supercomputers, prefer tools intended for checking disk usage (e.g. LUE) instead of running heavy recursive commands on large directory trees.
How to check disk usage on a directory
We recommend using the LUE tool to identify where you have lots of data.
Avoid using tools such as du as they may cause a lot of load on the file system.
Simple usage example (run lue -h for other options):
2. Choose the target storage service
Choose Allas if:
- your CSC project already has Allas access
- the amount of data is moderate (Few terabytes)
- your project has enough available Allas quota
Choose LUMI-O if:
- you have a project that has LUMI-O access
- you need object storage for a larger amount of data (> 10 TB)
- your group is already using LUMI or can apply for suitable access
Allas and LUMI-O are object storage services. They are useful for storing and transferring data during the transition to Roihu, but they are not replacements for /scratch or /projappl.
Object storage works best for data that can be uploaded and downloaded as complete files or archive packages. It is not suitable for storing full software installations, or working with large numbers of frequently changing small files.
3. Consider packaging many small files
If you have many small files, transferring and accessing them in an object storage service one by one can be slow and inefficient. Consider creating archive files before upload.
For example, to package the dataset/ directory in /scratch/project_2000000/mydata:
Then transfer the archive file instead of the full directory tree.
Later after you copy the data back, e.g. onto Roihu, unpack the archive there with:
Keep the original directory until you have verified that the uploaded archive can be downloaded and unpacked successfully.
Option A: Move data from Puhti or Mahti to Allas
Use Allas if you have existing Allas quota in your project.
If your sole purpose is to keep data in Allas for the transfer period to Roihu, please delete it without delay from Allas after you have moved the data to Roihu and verified the copy.
1. Log in to Puhti or Mahti
Log in to the system where your data is currently stored. E.g. to Puhti:
2. Configure Allas access
Load the Allas module and configure access, with S3 enabled:
Type in your CSC password when prompted.
The command can take a while to access Allas and process all information, please be patient.
This configures an Allas connection for the selected CSC project. With S3 mode enabled, the rclone remote used in this tutorial is:
3. Create a bucket
Check existing buckets in the project:
To create a new bucket, choose a bucket name that includes your project number or another project-specific identifier. Bucket names must be unique.
Example (replace project-2000000 as your project ID):
Creating a unique identifier
In the above example, the bucket is created with the name
project-2000000-roihu-transfer-${USER}. This gives the bucket your username as a suffix, which is a good way
to distinguish your own bucket from buckets
that other users in the project might create.
${USER} is an environment variable, and you do not need to
replace it in the tutorial commands, if you want to use your username
as a unique bucket identifier.
4. Copy data to Allas
To copy a single file:
To copy a directory:
rclone copy -P /scratch/project_2000000/mydata s3allas:project-2000000-roihu-transfer-${USER}/mydata
Use copy rather than move for the first transfer. This keeps the original data on Mahti or Puhti until you have verified that the upload succeeded.
5. Check the uploaded data
List the bucket contents:
For a more detailed listing:
You can also compare source and destination:
For large transfers, consider writing the output to a log file:
rclone copy /scratch/project_2000000/mydata s3allas:project-2000000-roihu-transfer-${USER}/mydata \
--progress \
--log-file roihu-transfer-to-allas.log
6. Downloading data later to Roihu from Allas
After Roihu is available, log in to Roihu and configure access to the object storage service that contains your data, following instructions that will be provided in the tutorial for using Allas in CSC supercomputers. The approach is very similar to Mahti and Puhti.
Then copy the data from object storage to the appropriate Roihu disk area.
On Roihu, check which available buckets your project has in Allas with rclone lsd:
[kkayttaj@roihu-cpu-login2 kkayttaj]$ rclone lsd s3allas:
3268222761 2020-10-03 10:01:42 8 2000000-genomes
2576778428 2020-10-03 10:01:42 4 2000000-mahti-SCRATCH
Copy the appropriate data to your directory on Roihu:
rclone copy -P s3allas:project-2000000-roihu-transfer-${USER}/mydata /scratch/project_2000000/mydata
After copying the data to Roihu, verify the transfer:
If you uploaded compressed archive files, unpack them only after confirming that the archive was transferred successfully:
7. Remove data from Allas after migration
After the data has been copied to Roihu and verified, remove the temporary copy from Allas if it is no longer needed.
Be careful: removal commands delete data from Allas. Do not run them before you have confirmed that the data exists in its final location.
To remove one object:
To remove all objects under a prefix:
Check that the bucket is empty:
Lastly, remove the empty bucket you were using:
Option B: Move data from Puhti or Mahti to LUMI-O
If you have an existing account and a project on LUMI, you might consider LUMI-O as a temporary storage service.
The same general rules apply as with Allas. Do not transfer everything blindly to LUMI-O. Carefully review what you actually need to preserve, and remove unnecessary files before transfer. If you only need temporary storage during the transition to Roihu, delete the files from LUMI-O after you have copied and verified them on Roihu.
1. Create an rclone setup for copying data from Mahti or Puhti directly to LUMI-O
First, you need credentials for LUMI-O and an access key to your project.
Object storage related tools are initialized in Puhti and Mahti with the command:
Connections to LUMI-O are configured with command:The configuration process asks you to login to https://auth.lumidata.eu where you can create an access key pair for your LUMI-project ( instructions for generating keys ).
You can then copy the project number, access key and secret key to the configuration process in Puhti or Mahti.
The configuration process creates four new rclone endpoints:
- lumi-o: and lumi-proj-number-private: refer to the non-public area of the LUMI-O project
- lumi-pub: and lumi-proj-number-public: to the public area of the LUMI-O project.
Use the private remotes for normal data transfer. Do not use a public remote unless the data is intentionally public.
In case of a-commands you can add option --lumi to the command in order to make use LUMI-O. For example:
Note that the Lumi-O keys have a validity time, defined in the authentication interface. Thus, you may need to update the connection configuration every now and then.
2. Create a bucket for your data
Before copying data to LUMI-O, create a bucket for the transfer.
In a Mahti/Puhti terminal:
See that the bucket is visible by listing the existing buckets in your project:
Replace 46500XXXX with your actual LUMI project ID.
Creating a unique identifier
In the above example, the bucket is created with the name
roihu-transfer-${USER}. This gives the bucket your username as a suffix, which is a good way
to distinguish your own bucket from buckets
that other users in the project might create.
${USER} is an environment variable, and you do not need to
replace it in the tutorial commands, if you want to use your username
as an unique bucket identifier.
3. Copy data to LUMI-O
Now you're ready to transfer data to LUMI-O.
On Mahti/Puhti:
Copy a single file:
Copy a directory:
For large transfers, run the command inside tmux or screen, and write a log file:
rclone copy /scratch/project_2000000/mydata lumi-46500XXXX-private:roihu-transfer-${USER}/mydata \
--progress \
--log-file roihu-transfer-to-lumio.log
4. Verify the uploaded data
List the uploaded data:
Compare the source directory and the uploaded copy:
For large datasets, save the verification output to a log file:
rclone check /scratch/project_2000000/mydata lumi-46500XXXX-private:roihu-transfer-${USER}/mydata \
--log-file roihu-transfer-lumio-check.log
5. Copy data back from LUMI-O later
After Roihu is available, configure LUMI-O access on Roihu and copy the data from LUMI-O to the appropriate Roihu disk area.
Example:
Verify the copied data:
If you uploaded compressed archive files, check and unpack them only after confirming that the archive was transferred successfully.
7. Remove data from LUMI-O after migration
After the data has been copied to Roihu and verified, remove the temporary copy from LUMI-O if it is no longer needed.
To remove one object:
To remove all objects under a prefix:
To remove an empty bucket:
Running long transfers safely
Large transfers may take a long time. Do not run them in a normal SSH session without protection, because the transfer may stop if the connection is interrupted.
Use screen or tmux, for example, in a session in Puhti/Mahti:
Start the transfer inside the tmux session. You can detach from the session with:
Later, reconnect with:
If the transfer is interrupted, you can safely run the same rclone copy command again.
rclone copy will skip files that already exist at the destination and continue copying missing or changed files.
Important notes
Do not delete the original data too early
Keep the original data on Mahti or Puhti until:
- the upload has completed
- the uploaded data has been checked
- the data is no longer needed on Mahti or Puhti
Be careful with rclone sync
rclone sync makes the destination match the source. This means it can delete files from the destination. Use rclone copy unless you are sure that synchronization is needed.
If you use sync, first run a dry run:
rclone sync /scratch/project_2000000/mydata s3allas:project-2000000-roihu-transfer-${USER}/mydata --dry-run
Object storage is not a working directory
Allas and LUMI-O are suitable for storing, staging, sharing, and transferring data. They are not replacements for scratch or project directories on a supercomputer.
Never run applications directly against object storage as if it were a normal filesystem.
Protect sensitive data
Do not upload sensitive data unless the storage service and your project’s data management plan allow it. If needed, encrypt data before uploading.
Document what was moved
To follow best practices, create a small README file and upload it to Allas/LUMI-O with the data. For example:
Dataset: Example simulation outputs
Original location: /scratch/project_2000000/example
Source system: Puhti
Uploaded by: <name>
Upload date: YYYY-MM-DD
Temporary storage: Allas bucket project-2000000-roihu-transfer-<my-username>
Intended destination: Roihu /scratch/project_2000000/example
Notes:
Upload the README:
or: