GDAL (Geospatial Data Abstraction Library) is a GIS translator library for accessing and transforming geospatial data. Most commonly it is used in file format or coordinate system changes.
GDAL is available in Puhti with following versions:
The stand-alone version doesn't have python bindings installed so e.g gdal_calc works only in the geoconda installations. Also, the supported file formats vary slightly between the GDAL installations. For instance, the PostGIS driver is not available in stand-alone gdal but is included in the geoconda versions.
GDAL is included in the modules listed above, so it can be used when any of these modules is loaded, or it can be loaded separately with:
module load geoconda
If you need to use a stand-alone version of GDAL or plan to build software on top of GDAL, you can load GDAL with
module load gdal
By default the latest gdal module is loaded. If you want a specific version you can specify the version number
module load gdal/<VERSION>
You can test if GDAL loaded successfully with following
r-env gdal commands can be used as:
apptainer_wrapper exec gdalinfo --version
Using files directly from Allas
It is possible to read files from Allas directly with GDAL, but not to write. For results, write them first to Puhti scratch and move later to Allas. The below mentioned virtual drivers are supported also in many GDAL-based tools. The set up is the same as below, but instead of the example gdalinfo command open the file from Python or R script. In R and Python it is possible also to write to Allas directly from script. We have tested successfully:
Reading data directly from Allas is slower than reading from scratch or other Puhti lustre disks, for example reading a ~500 Mb files from scratch takes ~1 second, but from Allas ~10 seconds. In most cases still comapered to full duration of an analysis in Puhti, these seconds are not important.
Public files in Allas can be read with
Private files can be read by SWIFT or S3 API. SWIFT is more secure, but the credetials need to be updated after 8 hours. S3 has permanent keys, is therefore little bit easier to use, but less secure. Both of these have a random reading and streaming API.
SWIFT. Set up the connection in Puhti and then read the files with
module load allas allas-conf export SWIFT_AUTH_TOKEN=$OS_AUTH_TOKEN export SWIFT_STORAGE_URL=$OS_STORAGE_URL gdalinfo /vsiswift/<name_of_your_bucket>/<name_of_your_file>
The export commands are needed because GDAL is looking for different environment variables than what allas-conf is writing. These commands need to be given each time you start working with Puhti, because the token is valid for 8 hours. Inside batchjobs use allas-conf -k.
S3. Set up the connection in Puhti and then read the files with vsis3-driver:
module load allas allas-conf --mode s3cmd gdalinfo /vsis3/<name_of_your_bucket>/<name_of_your_file>
module load allassets AWS_S3_ENDPOINT environment variable, which needs to be run each time S3 is used.
allas-confcommand saves your credentials in your home directory to .aws/credentials file. This needs to be run only once before first use or when you want to switch to another CSC project.
With large quantities of raster data (also in Allas), the most convenient method of accessing them might be GDAL virtual rasters.
License and acknowledgement
GDAL is licensed under an MIT/X style license
Please acknowledge CSC and Geoportti in your publications, it is important for project continuation and funding reports. As an example, you can write "The authors wish to thank CSC - IT Center for Science, Finland (urn:nbn:fi:research-infras-2016072531) and the Open Geospatial Information Infrastructure for Research (Geoportti, urn:nbn:fi:research-infras-2016072513) for computational resources and support".