-
Using rsync for data transfer and synchronization
Using rsync for data transfer and synchronization
SSH certificates are required to connect to Roihu over SSH
To connect to Roihu, users must sign their public key in MyCSC to obtain a time-based SSH certificate. Each certificate is valid for 24 hours, and once it expires, a new one must be generated by signing the public key again.
Rsync is a data transfer tool that can be used much like the scp command.
When transferring data, rsync checks the difference between the source and
target files and only transfers the parts that have changed. This makes rsync
suitable for:
- Synchronizing folders. Using
scporcpwould copy and transfer everything, whilersyncwill only copy and transfer the modifications. - Transferring large files.
rsynccan be set to save progress, so if the transfer is interrupted, it can be resumed at the same point.
The basic command syntax of rsync is:
If the data source or target location is a remote site, it is defined with the syntax:
However, both the target and source can also be located on the same machine. In that case you can just give directory paths to source and target sites.
The table below lists the most commonly used options:
| Option | Argument | Description |
|---|---|---|
-r |
Recurse into directories | |
-a |
Use archive mode: copy files and directories recursively and preserve access permissions and timestamps | |
-v |
Verbose mode | |
-z |
Compress | |
-e |
ssh |
Specify the remote shell to use |
-n |
Show what files would be transferred | |
--partial |
Keep partially transferred files | |
--progress |
Show progress during transfer | |
-P |
Same as --partial --progress |
|
-u |
Skip files that are newer on the receiver |
Warning
rsync will by default overwrite any changes made to the target, even if
they are newer than the source! Use option -u to avoid this.
Using rsync to transfer data between your local computer and Puhti
The command for transferring a local folder to Puhti, while showing the progress and keeping partially transferred files, would for example be:
This would either:
- Create a folder on Puhti at
/path/to/target/folderif the folder was not present before. In this case, everything in the local folder will be transferred. - Synchronize the source and target folders if the folder already exists on Puhti. In this case, only changes we have made will be transferred.
And the same thing in reverse:
Note
If you have stored your SSH key and/or certificate file with a non-default
name or in a non-default location (somewhere else than
~/.ssh/id_<algorithm> or ~/.ssh/id_<algorithm>-cert.pub), you can
specify where rsync should look for the key using the -e option. For
example:
rsync -rP -e "ssh -i /path/to/private/key -i /path/to/certificate" /path/to/local/folder <username>@<host>:/path/to/target
Note that SSH certificates are required for connecting to Roihu only.
Using rsync to transfer data directly between CSC supercomputers
To transfer data directly between CSC supercomputers, you must be able to access the SSH keys you've set up on your local workstation for authenticating to CSC supercomputers. For Roihu, a valid SSH certificate is also needed. This is accomplished by forwarding your SSH agent including your SSH keys (and certificate) to the supercomputer you're first connecting to.
After this, rsync can be used to transfer data directly between CSC
supercomputers using same syntax as above. For example, to copy a directory
/scratch/project_2001234/myfiles on Puhti to the corresponding path on Mahti: