Introduction to the Allas storage service

What is Allas?

Allas is CSC's general purpose research data storage server. It is a part of the CSC storage portfolio and can be accessed on the CSC servers as well as from anywhere on the Internet. Allas can be used both for static research data that needs to be available for analysis and to collect and host cumulating or changing data. A CSC project is required to import data to Allas. Allas can be used to host data as long as the CSC project is active.

From the technical point of view, Allas is a modern object storage system. It comes with S3 and Swift interfaces on a CEPH storage. In practice, this means that instead of files, the data is stored as objects in buckets. A bucket is a container for objects that may also include metadata describing the bucket.

The stored objects can be of any data type, such as images or compressed data files. In general, objects are similar to files. The object storage can be used for a variety of purposes. It has benefits but also limitations.

Benefits

Limitations

Different ways to use Allas

You cannot mount Allas direcly to a computer. This means that in order to use Allas, you need software tools to access it. There are four main ways to access Allas:

Allas access clients

  1. In the CSC computing environment (e.g. Puhti), there are ready-to-use tools provided by CSC to access Allas. These tools are mostly the same that can also be installed in any linux environment, e.g. a virtual machine in cPouta or local Linux server.
    In the CSC computing environment, Allas should be used to store any data that needs to be preserved for longer than a few weeks. The supercomputer's own storage has a policy to delete idle data, so the data must be moved to Allas after computing. See Computing disk environment

  2. WWW access to Allas is provided by the web interface of the cPouta cloud environment https://pouta.csc.fi. No special software is required to access Allas with a browser, making this the by far simplest way to access Allas. On the other hand, the browser user interface has a number of limitations compared to other clients, the most notable of which are lower performance and uploading/downloading only a single file at a time. The instructions for accessing and using Allas with a browser: OpenStack Horizon web interface

  3. To access Allas with command line commands, client software supporting the Swift or S3 protocol is required. This is the most flexible way to access Allas, but it requires more effort from than other access methods. Instructions to use a command line client: Accessing Allas with Linux.

  4. To access Allas with a GUI client, a suitable GUI client is required. The client needs to be capable to use the Swift or S3 access protocol. Instructions to use a GUI client: Accessing Allas with Windows and Mac.

See also the common Use cases.

Billing and quotas

Allas is used with project-based storage quotas. The default quota for a new project is 10 TB, but that can be increased if needed. Allas is the preferred storage site for any large datasets in the CSC environment, so you should not hesitate to request a larger quota for Allas, if you work with larger data sets.

All project members have equal access rights to the storage area that has been granted for the project. In practice, this means that if one project member uploads data to Allas, all other project members can also read, edit and delete the data. Allas itself does not store any information about who has uploaded the data to Allas.

The default quotas for projects:

Resource Limit
Storage amount 10 TiB
Buckets per project 1 000
Objects per bucket 500 000

Storing data in Allas consumes billing units. In Allas, billing is based on the amount data stored in Allas. The rate is 1 BU/TiBh, i.e. 1 TB of data stored in Allas consumes 24 BU in a day and 8760 BU in a year.

Unlike most other object storage providers, CSC does not charge for object storage network transfers or API calls.

Protocols

The object storage service is provided over two different protocols, Swift and S3. From the user perspective, one of the main differences between S3 and Swift is authentication. The token-based Swift authentication used in Allas remains valid for eight hours at a time, but in the key-based S3, the connection can stay permanently open. The permanent connection of S3 is practical in many ways, but it includes a security aspect: if the server where Allas is used is compromised, the object storage space will be compromised as well.

Due to this security concern, Swift is the recommended protocol for multiple-user servers, such as Mahti and Puhti. Thus, for example, the CSC-specific a-commands as well as the standard rclone configuration in Puhti are based on Swift. However, in some cases, the permanent connections provided by the S3 protocol may be the most reasonable option, for example, in personal virtual machines running in cPouta.

The Swift and S3 protocols are not compatible when handling objects. For small objects that do not need to be split during upload, the protocols can be used interchangeably, but split objects can be accessed only with the protocol that was used for uploading them. The size limit for splitting an object depends on the settings and on the protocol. The limit is typically between 500 MB and 5 GB.

Generic recommendations for selecting the protocol:

Clients

Allas is accessed via client software that takes care of moving data to and from Allas and managing data objects. There are several different kinds of client software for accessing the object storage servers. Allas can be used with any object storage client that is compatible with the Swift or S3 protocol.

Client Notes
web client Use via https://pouta.csc.fi. Provides basic functions.
a-commands Provides easy-to-use tools for basic use. Requires Rclone, Swift and OpenStack.
swift python-swiftclient The recommended Swift client.
s3cmd The recommended S3 client (version 2.0.2 or later).
python-swift-library
rclone Useful with supercomputers.
libs3
python-openstackclient
aws-cli aws-cli and the boto3 python library.
curl Extremely simple to use with public objects and temporary URLs.
wget Same as curl.

Client operations

A web client is suitable for using the basic functions. a_commands offers easy-to-use functions for using Allas either via a personal computer or supercomputer. Power users might want to consider the clients rclone, Swift and s3cmd. The table displays the core functions of the power clients concerning data management in Allas.

    web client     a_commands      swift         s3cmd  
Usage Basic Basic Power Power
Create buckets
Upload objects
List
       objects
       buckets
Download
       objects
       buckets
Remove
       objects
       buckets •• •• ••
Managing access rights
       public/private
       read/write access
       to another project
       temp URLs
Move objects
Edit metadata
Upload large files (over 5 GB)
Download whole project
Remove whole project
• Only one object at a time
•• Only empty buckets

System Characteristics

In Allas, objects are stored in buckets. A bucket is a data object container. Buckets should not be confused with dockers or other computing containers. A bucket functions similarly to a file system directory, except that there can only be one level, i.e. buckets cannot contain other buckets.

Allas projects and buckets Figure Data structure in Allas

Naming buckets

Each bucket has a name that must be unique across all Allas users. If another user has a bucket called "test", another bucket called "test" cannot be created. All bucket names are public, so please do not include any confidential information in the bucket name. You may, for example, use your project ID, e.g. 2000620-raw-data.

Object URLs can be in the DNS format, e.g. https://a3s.fi/bucketname/objectname. Please use a valid DNS name (RFC 1035). We recommend not using upper case or non-ASCII (ä, ö etc.) characters.

It is not possible to rename a bucket.

The data is spread across various servers, which protects against disk and server failures. Please note: This does not protect the data from e.g. accidental deletion. Please make regular backups of important data.