Skip to content

SD Connect (Sensitive Data Connect)

Before you start

  • According to CSC policies and general terms of use, sensitive data always needs to be encrypted when uploaded or stored in CSC services for sensitive data. In this paragraph, we provide instructions on encrypting a copy of your data with CSC encryption key and Crypt4GH. For general information about Crypt4GH check the Data encryption for data sharing paragraph or crypt4gh GIT site.

  • SD Connect facilitates working with sensitive data and it is a user interface for Allas, CSC cloud storage solution. By default a project can store up to 10 TiB of data. The storage space remains available as long as the CSC project is active. CSC does not make backups of the data in SD Connect. You need to make your own backups of important datasets.

Note

SD Connect and SD Desktop have not yet been security audited. Because of that users may not process any personal data granted for the purposes of the Act on the Secondary Use of Health and Social Data (552/2019) by Findata.

Login

To access SD Connect go to MyCSC and:

Login to SD Connect is currently possible only with Haka (a user identity federation system) and CSC credentials at:

The interface is compatible with all modern web browsers.

space in user guide

SD-Connect-0

space in user guide

User Interface pages

space in user guide

Once you log in to SD Connect you access the default front-page: Browser.

In this page you can :

  • view all the buckets available in your CSC project, in which you can store encrypted sensitive data. The buckets can be created, downloaded, deleted or shared, using the appropriate icons. Note: SD Connect displays also all the data uploaded in Allas using CSC interfaces for non sensitive data management.

  • list and select your CSC project from the drop down menu bar (top left corner) to visualize buckets belonging to a specific CSC project;

  • open any bucket (double click) and view its content (uploaded files or folders). Any file can be downloaded or shared using the download link. From this view, you can also download the entire bucket, delete files or upload new files and folders;

space in user guide

SD Connect Image 1

  • clicking on edit you can type in and add appropriate tags to describe buckets or files.

space in user guide

SD Connect image 2

In the User information page you can:

  • in Currently Consumes view statistics about the selected CSC project resource usage: billing unit consumption and the total project storage usage (default storage 10 TiB);

  • in Project usage you can view the SD Connect Project Identifier, an ID associated to your CSC project. This ID is required when you want to share containers with other CSC projects using SD Connect user interface. It does not contain sensitive information, thus it can be shared with your colleagues or collaborators via email.

  • access the Sharing API tokens through which you can generate a temporary token (necessary for data upload programmatically, using Swift client. For more info check below).

space in user guide

SD-Connect-2

In the Shared page:

  • in Shared to the project you can view the buckets that other CSC projects (belonging to your colleagues or collaborators) shared with you. Next to the bucket name, under Bucket Owner, it displays the ID associated with the CSC project to which the bucket belongs to (also called SD Account). With double click you can access the bucket and view the content (if you have reading access) or add files to the container (if you have edits rights).

Note

All the buckets listed here are owned by other users which can decide when to revoke your access. You will not be able to access the file from SD Desktop until you make a copy of the bucket.

  • in Shared with the project you can view the buckets which you shared with other CSC projects. In this case you own the shared buckets and you can decide when to revoke access.

space in user guide sd-connect-4 space in user guide

Sensitive data encryption and upload (less than 1 GB)

SD Connect allows you to encrypt and upload files directly from your web-browser. With the following workflow ayou can automatically encrypt the data with Sensitive Data services public encryption key (Encrypt files before uplaod: on; Ephimeral private key: on).

As this is a simplified workflow, it is designed to allow easy and safe encryption and automated decryption only using SD Desktop for data analysis or other SD services components. If you are interested in using your own encryption key pair or sharing the data with a collaborator, check the following paragraph.

1- To upload data to SD Connect it is sufficient to:

  • use the drag and drop function

  • click on the upload icon in the SD Connect browser window.

space in user guide

1

2- You will be redirected to a new page displaying the default encryption options.

3- Here, you can specify the name of the bucket in which the data should be uploaded to. If you don't fill in a specific name, the user interface will automatically create a bucket named: upload-nnn (where nnn is replaced with a 13 digit number based on creation time). Note that it is not possible to rename buckets.

4- If you create a new bucket use the following suggestions to name it:

  • Bucket names must be unique across all existing buckets in all projects in SD-Connect and Allas. If you can't create a new bucket, it's possible that some other project is already using the name you would like to use. To avoid this kind of situation it is good practice to include some project specific identifiers (e.g. project ID number or acronym) in the bucket names.

  • Avoid using spaces and special characters in bucket names. Preferred characters are Latin alphabets (a-z), numbers (0-9), dash (-), underscore (_) and dot (.). SD Connect can cope with other characters too, but they may cause problems in some other interfaces.

  • All bucket names are public, so please do not include any confidential information in the bucket names

5- With the icon Click to add files that will be uploaded you will open a browser window in which you can select and add more files.

space in user guide

SD Connect final 1

6- Next click on Encrypt and upload: each file will be automatically encrypted and uploaded to the bucket in SD Connect.

space in user guide 3

7- Once the process is completed, you can return to the SD Connect browser window. The encrypted files will show the extension .c4gh.

space in user guide 6

space in user guide

Sensitive data encryption and upload (less than 100 GB)

space in user guide

As the workflow described above is still under development, files larger than 1 Gb need to be encrypted and uploaded in two different steps. For this reason, we have developed a simple encryption tool (Crypt4ghsds GUI) that facilitates data encryption with Sensitive Data public encryption key. With this tool it is possible to encrypt only one file at the time. If you need to encrypt large datasets, check the instructions on how to programmatically encrypt files with Crypt4gh CLI below.

Note

As this is a simplified workflow, it is designed to allow easy and safe encryption and automated decryption only using the Sensitive Data services. Using this workflow does not allow you to include your encryption keys. Thus, you will not be able to decrypt this copy of the data unless analysing it in SD Desktop. If you are interested in using your own encryption key pair check the following paragraph

1- First, download the encryption application specific to your operating system from the GitHub repository:

2- Verify that the program has been digitally signed by CSC - IT Center for Science. After downloading and unzipping the file, you can find the Crypt4GH application in your download folder. When you open the application you might encounter an error message. In this case, click on More info and verify that the publisher is CSC-IT Center for Science (or in Finnish CSC-Tieteen tietotekniikan keskus Oy) and then click on Run anyway.

3- To Encrypt the files, open the encryotion tool and press the Select File button. This opens a file browser that you can use to select the file that will be encrypted. When the file is selected, press the Encrypt button. This encrypts the selected file.

4- The tool creates a new encrypted file that is named by adding to the end extension .c4gh, located in the same folder as the original file For example, encrypting file my_data1.csv will produce a new, encrypted file with name my_data.csv.c4gh. Currently, Crypt4GH application does not provide a progress bar and if the file is large the encryption process can last for up to minutes.

space in user guide

crypt4gh new

5- To upload the encrypted file (or a folder containing encrypted data) to SD Connect it is sufficient to:

  • use the drag and drop function

  • click on the upload icon in the SD Connect browser window.

space in user guide 1

6- You will be then redirected to a new page. As you have already encrypted the data, you can deselect the option: Encrypt file before upload.

7- Next, you can specify the name of the bucket in which the data should be uploaded to. If you don't fill in a specific name, the user interface will automatically create a bucket named: upload-nnn (where nnn is replaced with a 13 digit number based on creation time). Note that it is not possible to rename buckets.

8- If you create a new bucket use the following suggestions to name it. Bucket names must be unique across all existing buckets in all projects in SD-Connect and Allas. If you can't create a new bucket, it's possible that some other project is already using the name you would like to use. To avoid this kind of situation it is good practice to include some project specific identifiers (e.g. project ID number or acronym) in the bucket names. Avoid using spaces and special characters in bucket names. Preferred characters are Latin alphabets (a-z), numbers (0-9), dash (-), underscore (_) and dot (.). SD Connect can cope with other characters too, but they may cause problems in some other interfaces. All bucket names are public, so please do not include any confidential information in the bucket names.

9- Next, click on Upload. A progress bar will visualise the status of the upload. Once the process is completed, you can return to the SD Connect browser window. The encrypted files will show the extension .c4hg.

space in user guide

example upload 5

space in user guide

Data encryption and upload with Sensitive Data encryption key - Command Line Interface

space in user guide

Note

Files that have been encrypted with the CSC Sensitive Data Services public key, can be decrypted only when imported in SD Desktop, thus using CSC Sensitive Data Services. If you wish to encrypt the data to transfer them to other services, you need to plan the encryption in advance and use your own encryption key pair. For more information, check the Data Sharing section in these paragraph below and the Data encryption for data sharing paragraph.

For general information about using Crypt4GH at CSC check: * crypt4gh GIT site

Step 1: Install the latest version of Crypt4GH encryption tool

Python 3.6+ required to use the crypt4gh encryption utility. To install Python: https://www.python.org/downloads/release/python-3810/

If you have a working python installation and you have permissions to add libraries to your python installation, you can install Crypt4GH with command:

pip install crypt4gh

Step 2: Download CSC Sensitive Data services Public key

Download CSC Sensitive Data Services public key from the link here, or copy/paste the three lines from the box below into a new file. The file should be saved in text-only format. Here we assume that the key file is named as csc-sd-services.pub.

-----BEGIN CRYPT4GH PUBLIC KEY-----
dmku3fKA/wrOpWntUTkkoQvknjZDisdmSwU4oFk/on0=
-----END CRYPT4GH PUBLIC KEY-----

Step 3: Encrypt a file

Crypt4GH is able to use several public keys for encryption. This can be very handy in cases were the encrypted data needs to be used by several users or services. Unfortunately SD Connect is not yet compatible with encryption with multiple keys. Because of that you must do the encryption using the CSC Sensitive Data Services public key only, if you plan to upload the data to SD Connect. In this case the syntax of the encryption command is:

crypt4gh encrypt --recipient_pk public-key < input > output
For example

crypt4gh encrypt --recipient_pk csc-sd-services.pub < my_data1.csv > my_data1.csv.c4gh
The encrypted file (my_data1.csv.c4gh) can now be uploaded to SD Connect and will be automatically decrypted when imported in your own private computing environment in SD Desktop.

Data encryption and upload with Allas help tool: a-put

The allas client utilities is a set of command line tools that can be installed and used in Linux and MacOSX machines. If you have these tools, you can use data upload command a-put with command line option --sdx to upload data to Allas/SD Connect so that the uploaded files are automatically encrypted with the CSC Sensitive Data Services public key before the upload. The public key is included to the tool so that you don't need to download your own copy of the key.

You can upload a single file with command like:

a-put --sdx my_data1.csv
By default a-put --sdx uploads the encrypted file into bucket that has name project-number-SD_CONNECT .

You can also upload complete directories and define a specific target bucket. For example the command below will encrypt and upload all the files in directory my_data to SD Connect into bucket 1234_SD_my_data.

a-put --sdx my_data -b 1234_SD_my_data

Programmatic data upload and download with SD Connect

To upload encrypted data to SD Connect programmatically, you need to use your CSC credentials (CSC username and password).

SD Connect is a user interface for CSC Allas object storage. In practice this means that any data which you can access in Allas, can also be imported to SD Desktop with SD-Connect Downloader.

Thus you can use any of the Allas compatible clients to upload your data to SD-Connect programmatically. However, as SD Connect is based on Swift protocol, it is recommended that you use upload tools that are based on swift protocol.

These include:

Note that if you use these tools, you must encrypt your sensitive data, before you upload it to SD Connect.

Data Sharing

space in user guide

Note

For more information about encryption with private keys check: Data encryption for data sharing.

SD Connect user interface provides a simple way of sharing containers between different projects.

To share a container with another CSC project (and thus one of your colleagues or collaborators) you need to:

  • know in advance the SD Account of the CSC project you want to share a container with (see above in User Interface paragraph, where this can be found)

  • in the browser page click on the share button on the row of the container in the container listing

Clicking the button takes you to Share the container view, in which the user needs to specify the project/projects the container is going to be shared to, and what rights to give:

  • select Grant read permission if you want your colleagues to be able to see the files and folder inside the container and download them

  • select also Grant write permissions if you want your colleague to be able to add files and folder to the shared container select. If you select only this option, your colleague or collaborator will be only able to add files to the container, but not be able to see its content.

  • in Project Indetifiers to share with add the SD Connect Project Identifier of the project you want to share the container with

  • Next click on Share

At this point the user interface will redirect you to the Shared page and the container will be listed under Shared from project. Here you will be able to interrupt the sharing clicking on Revoke container access.

space in user guide

sd-connect-6

space in user guide

SD-Connec-7

space in user guide

sd-connect-8

Troubleshooting

Problem Possible Solution
Decryption I cannot decrypt the data I downloaded from CSC services. You can decrypt the data only if you have used your own public key for the encryption. If you used a CSC Sensitive Data Services public for the encryption, the data can be decrypted only in SD Desktop. In that case, the decryption is automatic. If you used your collaborator’s public key to encrypt the data, only they can decrypt the data with their private key.
Encryption Encryption takes a long time. For large files and datasets, the encryption can take up to a few minutes.
Folder encryption I can not select the folder I want to encrypt with Crypt4GH graphical user interface. It is not possible to encrypt an entire folder, just single files
Problem Possible solution
Data upload I am trying to upload a big file/folder with the user interface and the upload is stuck. To upload files or folders that are larger than 200 GB, the data should be uploaded programmatically.
Low upload speed (programmatically) Average upload speed can go from 100 to 200 MiB/s. Specific scripts can be used to optimize the upload of large files.
Bucket I am not able to create a new bucket. 1) Check in MyCSC portal that your current project has service access for Allas 2) Try to use a bucket name that is unique and doesn’t contain special characters. 3) Select the correct project in SD Connect user interface
I cannot find my bucket. Check if the bucket is stored under a different project. If someone has shared the bucket with you, you can find it under the ‘Shared to’ section and copy it. If someone has shared the bucket with you, they could have revoked the sharing.
I cannot upload data into my bucket Check that your project still has storage space left.
Shared bucket I cannot upload data into a shared bucket. Your colleague didn’t add editing rights when they shared the bucket.
I cannot see the content of a shared bucker. Your colleague didn’t add reading rights when they shared the bucket.

Last edited Thu Jan 13 2022