GDAL (Geospatial Data Abstraction Library) is a GIS translator library for accessing and transforming geospatial data. Most commonly it is used in file format or coordinate system changes.
GDAL is available in Puhti with following versions:
- 3.0.4 via conda: geoconda-3.8,
- 3.0.2 via conda: geoconda-3.7,
- 2.4.3 via conda: snap
- 2.4.2 via conda: mapnik
- 2.4.1 via conda: solaris and Orfeo ToolBox
- 3.0.1 stand-alone: gdal module,
- 2.4.2 stand-alone: gdal module, r-env, additionally FORCE and Saga-GIS use this GDAL, but the GDAL commandline tools are not included in these modules.
- 2.4.2 in r-env-singularity Singularity container
The stand-alone versions don't have python bindings installed so e.g gdal_calc works only in the conda installations. Also, the supported file formats vary slightly between the gdal installations. For instance, the PostGIS driver is not available in gdal/3.0.1 but is included in the conda versions.
GDAL is included in the modules listed above, so it can be used when any of these modules is loaded, or it can be loaded separately with:
module load geoconda
If you need to use a stand-alone version of gdal or plan to build software on top of gdal, you can load gdal with
module load gcc/9.1.0 gdal
By default the latest gdal module is loaded. If you want a specific version you can specify the version number
module load gcc/9.1.0 gdal/<VERSION>-omp
You can test if gdal loaded successfully with following
Using files directly from Allas
It is possible to read files from Allas directly with GDAL, but not to write. For results, write them first to Puhti scratch and move later to Allas. The below mentioned virtual drivers are supported also in many GDAL-based tools. The set up is the same as below, but instead of the example gdalinfo command open the file from Python or R script. In R and Python it is possible also to write to Allas directly from script. We have tested successfully:
Reading data directly from Allas is slower than reading from scratch or other Puhti lustre disks, for example reading a ~500 Mb files from scratch takes ~1 second, but from Allas ~10 seconds. In most cases still comapered to full duration of an analysis in Puhti, these seconds are not important.
Public files in Allas can be read with
Private files can be read by SWIFT or S3 API. SWIFT is more secure, but the credetials need to be updated after 8 hours. S3 has permanent keys, is therefore little bit easier to use, but less secure. Both of these have a random reading and streaming API.
SWIFT. Set up the connection in Puhti and then read the files with
module load allas allas-conf export SWIFT_AUTH_TOKEN=$OS_AUTH_TOKEN export SWIFT_STORAGE_URL=$OS_STORAGE_URL gdalinfo /vsiswift/<name_of_your_bucket>/<name_of_your_file>
The export commands are needed because GDAL is looking for different environment variables than what allas-conf is writing. These commands need to be given each time you start working with Puhti, because the token is valid for 8 hours. Inside batchjobs use allas-conf -k.
S3. Set up the connection in Puhti and then read the files with vsis3-driver:
module load allas allas-conf --mode s3cmd gdalinfo /vsis3/<name_of_your_bucket>/<name_of_your_file>
module load allassets AWS_S3_ENDPOINT environment variable, which needs to be run each time S3 is used.
allas-confcommand saves your credentials in your home directory to .aws/credentials file. This needs to be run only once before first use or when you want to switch to another CSC project.
With large quantities of raster data (also in Allas), the most convenient method of accessing them might be GDAl virtual rasters. More information here.
License and citing
GDAL/OGR is licensed under an MIT/X style license
In your publications please acknowledge also oGIIR and CSC, for example “The authors wish to acknowledge for computational resources CSC – IT Center for Science, Finland (urn:nbn:fi:research-infras-2016072531) and the Open Geospatial Information Infrastructure for Research (oGIIR, urn:nbn:fi:research-infras-2016072513).”