Common use cases
Processing data in HPC systems
To use the computing environment in Taito or Puhti, use the open source parallel file system Lustre. In these file systems, files are automatically removed after 90 days. One of the main use cases of Allas is to store data that is not in active in the HPC systems. Before beginning, stage the data in. When the data is no longer actively used, it can be staged out.
We recommend using the Swift protocol on Allas. It is important not to mix Swift and S3, as these protocols are not fully mutually compatible.
Sharing data, e.g. datasets or research results, is easy in the object storage. You can share these either with a limited audience, e.g. other projects, or allow access for everybody by making the data public.
The data can be accessed and shared in a variety of ways:
Private - default: By default, if you do not specify anything else, contents of buckets can only be accessed by authenticated members of your project. Private/Public settings can be managed with:
Access Control Lists: Access control lists (ACLs) work on buckets, not objects. With ACLs, you can share your data in a limited way to other projects. You can e.g. grant a collaboration project authenticated read access to your datasets.
Public: You can also have ACLs granting public read access to the data, which is useful for e.g. sharing public scientific results or public datasets.
Static web content
A common way to use the object storage is storing static web content, such as images, videos, audio, pdfs or other downloadable content, and adding links to it on a web page, which can run either inside Allas or somewhere else. An example
Storing data for distributed use
There are several cases where you need to access data in several locations. In these cases, the practice of staging in the data to individual servers or computers from the object storage can be used instead of a shared file storage.
Accessing the same data via multiple CSC platforms
Since the data in the object storage is available anywhere, you can access the data via both the CSC clusters and cloud services. This makes the object storage a good place to store data as well as intermediate and final results in cases where the workflow requires the use of e.g. both Allas and Puhti.
Collecting data from different sources
It is easy to push data to the object storage from several different sources. This data can then later be processed as needed.
For example, several data collectors may push data to be processed, e.g. scientific instruments, meters, or software that harvests social media streams for scientific analysis. They can push their data into the object storage, and later virtual machines and computing jobs on Puhti can process the data.
Self-service backups of data
The object storage is also often used as a location for storing backups. It is a convenient place to push copies of database dumps.
allas-backup is a part of a_commands. It works as a tool for creating backup copies of files to Allas. Please note: allas-backup is not a real backup service. It only copies the data to another bucket in Allas which can be easily removed or overwrited by any authenticated user.
Files larger than 5 GB
Files larger than 5 GB must be divided into smaller segments before uploading.
a_command a-put splits large files automatically: a-put
Using Swift, you can use Static Large Object: swift with large files
s3cmd splits large files automatically: s3cmd put
If you are using the s3cmd client, check your project's object storage usage:
s3cmd du -H
If you use the Swift client:
Display how much space a bucket has used:
swift stat $bucketname
Please contact firstname.lastname@example.org if you have questions.