Skip to content

Using Allas with S3 using Python boto3 library

boto3 is a Python library for working S3 storage and other AWS services. boto3 works with Allas over S3 protocol.

In general for analyzing Allas data with Python: * Save input data to Allas, possibly using other Allas tools * Download the data from Allas to the local computer (inc. supercomputers) with boto3. * Analyze the data using the local copy of data. * Write your results to local disk. * Upload the new files to Allas with boto3.

Some Python libraries might support also direct reading and writing with S3, for example AWS SDK for Pandas, GDAL-based Python libraries for spatial data analysis.

This page shows how to:

  • Install boto3
  • Set up S3 credentials
  • Create boto3 client
  • List buckets and objects
  • Create a bucket
  • Upload and download an object
  • Remove buckets and objects

Note, that S3 and SWIFT APIs should not be mixed.

Installation

boto3 is available for Python 3.8 and higher and can be installed with pip or conda.

pip install boto3

boto3 in CSC supercomputers

Some existing Python modules might have boto3 pre-installed, for example geoconda. To other modules, it is possible to add boto3 with pip.

Configuring S3 credentials

If you have not used Allas with S3 before, then first create S3 credentials. The credentials are saved to ~/.aws/credentials file, so they need to be set only once from a new computer or when changing project. The credential file can be also copied from one computer to another.

In CSC supercomptuers allas module can be used with allas-conf --mode s3cmd to configure the credentials.

boto3 usage

Create boto3 resource

For all next steps, first boto3 resource must be created.

import boto3
s3_resource = boto3.resource('s3', endpoint_url='https://a3s.fi')

Create a bucket

Create a new bucket using the following script:

s3_resource.create_bucket(Bucket="examplebucket")

List buckets and objects

List all buckets belonging to a project:

for bucket in s3_resource.buckets.all():
    print(bucket.name)

And all objects belonging to a bucket:

my_bucket = s3_resource.Bucket('examplebucket')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

Download an object

Download an object:

s3_resource.Object('examplebucket', 'object_name_in_allas.txt').download_file('local_file.txt')

Upload an object

Upload a small file called my_snake.txt to the bucket snakebucket:

s3_resource.Object('examplebucket', 'object_name_in_allas.txt').upload_file('local_file.txt')

Remove buckets and objects

Delete all objects from a bucket:

my_bucket = s3_client.Bucket('examplebucket')
my_bucket.objects.all().delete()

Delete a bucket, must be empty:

s3_resource.Bucket('examplebucket').delete()


Last update: January 16, 2024