Common use cases

Processing data in HPC systems

To use the computing environment in Taito or Puhti, use the open source parallel file system Lustre. In these file systems, files are automatically removed after 90 days. One of the main use cases of Allas is to store data that is not in active in the HPC systems. Before beginning, stage the data in. When the data is no longer actively used, it can be staged out.

Note

We recommend using the Swift protocol on Allas. It is important not to mix Swift and S3, as these protocols are not fully mutually compatible.

Sharing data

Sharing data, e.g. datasets or research results, is easy in the object storage. You can share these either with a limited audience, e.g. other projects, or allow access for everybody by making the data public.

The data can be accessed and shared in a variety of ways:

Static web content

A common way to use the object storage is storing static web content, such as images, videos, audio, pdfs or other downloadable content, and adding links to it on a web page, which can run either inside Allas or somewhere else. An example

Uploading data to Allas can be done with any of the following clients: web client, a_commands, Swift or S3.

Storing data for distributed use

There are several cases where you need to access data in several locations. In these cases, the practice of staging in the data to individual servers or computers from the object storage can be used instead of a shared file storage.

Accessing the same data via multiple CSC platforms

Since the data in the object storage is available anywhere, you can access the data via both the CSC clusters and cloud services. This makes the object storage a good place to store data as well as intermediate and final results in cases where the workflow requires the use of e.g. both Allas and Puhti.

Collecting data from different sources

It is easy to push data to the object storage from several different sources. This data can then later be processed as needed.

For example, several data collectors may push data to be processed, e.g. scientific instruments, meters, or software that harvests social media streams for scientific analysis. They can push their data into the object storage, and later virtual machines and computing jobs on Puhti can process the data.

Self-service backups of data

The object storage is also often used as a location for storing backups. It is a convenient place to push copies of database dumps.

allas-backup is a part of a_commands. It works as a tool for creating backup copies of files to Allas. Please note: allas-backup is not a real backup service. It only copies the data to another bucket in Allas which can be easily removed or overwrited by any authenticated user.

Files larger than 5 GB

Files larger than 5 GB must be divided into smaller segments before uploading.

Viewing

If you are using the s3cmd client, check your project's object storage usage:

s3cmd du -H

If you use the Swift client:

swift stat

Display how much space a bucket has used:

swift stat $bucketname

Please contact servicedesk@csc.fi if you have questions.