Earth Observation guide
This guide aims to help researchers to work with Earth Observation (EO) data using CSC's computing resources. The purpose of this guide is to give an overview of available options, so it would be easier to decide if CSC has suitable services for your EO research. It also helps you find the right data and tools for raster data based EO tasks. This guide focuses on spaceborne platforms. However, many tools and concepts also apply to airborne platforms. If you are interested in the fundamentals of EO, please check the resources and further reading section.
What are the benefits of using EO data?
- Possibility to observe wide areas at same time
- Same sensor for different parts of the world, easy to compare different areas
- Time series to see changes during different seasons and years
Raster data format
Why should I use CSC computing resources for EO?
For working with EO data in general, there are three main options:
1) EO specific services, which provide both data and advanced ready-to-use processing environments. Usually these give better user experience and efficiency, but the services might be limited in computing power, available tools and options for adding own data. Often these have fees for using. Examples are Google Earth Engine, SentinelHub and Microsoft Planetary Computer.
2) Cloud services with access to EO data. Practically, the data is often stored in object-storage and can be accessed as independent service. They also provide general computing services, such as virtual machines, to which EO tools need to be installed by the end-user. These options usually have some fees, mainly for processing. The data download may be free of charge or have a small cost, depending on the amount of data needed. Examples are Data and Information Access Services (DIAS) and Amazon Web Services; also the Microsoft Planetary Computer somewhat fits this category.
3) Own computing environment - PC, local cluster, virtual machines. Data needs to be downloaded and all tools must be installed to this system. On the other hand, it gives more freedom to select the tools and set-up. Usually this does not cause any extra costs, but the computing power is usually rather limited.
CSC services do not fit well in this categorization, as they provide some features from all of these. CSC computing services provide a lot of computing power and storage space, and they are free of charge for Finnish researchers for academic or educational use.
At CSC, EO data can be processed and analyzed using a supercomputer, for example supercomputer Puhti, or a virtual machine in the cPouta cloud service. Puhti's computing capacity can hardly be compared to any other EO service, in both available processing power and amount of memory. Both Puhti and cPouta have also GPU resources, which are especially useful for large simulations and deep learning use cases.
Puhti has also a lot of pre-installed applications, so it is an environment ready to use. cPouta virtual machines are similar to commercial cloud services, where all set-up and installations are done by the end-user. In general, both services only support Linux software.
At CSC, some Finnish EO datasets are available for direct use. In many cases, however, downloading EO data from other services (see list of EO data download services) is a required step of the process. Puhti and cPouta provide local storage of ~1-20 Tb. For more storage space, Allas object storage can be used.
Using CSC computing services requires basic Linux skills and ability to use some scripting language (for example Python, R, Julia) or command-line tools. In addition, supercomputers and virtual machines require you to understand some specific concepts, so it takes a few hours to get started. The Puhti web interface makes the start considerably easier, providing a desktop environment in the web browser, which enables the use of tools with Graphical User Interfaces (GUI) and also tools like R Studio and JupyterLab for an easy start with R, Python and Julia.
What data do I need?
When starting a task that requires EO data, there are multiple factors to consider. The decision on what are the most important factors depends heavily on the task and the resources available. The following list summarizes what one needs to consider when defining the data needs:
- Sensor: Different sensors cover different intervals of the electromagnetic (EM) spectrum and with that show different properties of the observed areas, they can be active or passive:
- Multispectral: multiple intervals around the visible spectrum of the EM are observed at the same time
- Hyperspectral: more but usually shorter intervals of the EM are observed at the same time
- RADAR (Radio Detection and Ranging), SAR (Synthetic Aperture Radar), active sensing in the microwave/radio frequencies of the EM spectrum
- LiDAR (Light Detection and Ranging), using a laser as energy source in the optical part of the EM spectrum
- Note that depending on the wavelengths observed, clouds, ground conditions and atmospheric artifacts may result in data gaps
- Temporal: when and how often a certain area is revisited
- Spatial: the area on the ground that each pixel covers, determining the size of the smallest possible feature that can be detected
- Spectral: the area of the electromagnetic spectrum that is observed and spectral width of each band provided
- Radiometric: number of bits used to represent the energy recorded (bit-depth)
- Some EO data is freely available as open data
- Some commercial datasets might be possible to get for free/less for research
- Preprocessing level
- Raw data - can have different levels and often need to be processed before it can be used for reliable analysis
- Different levels of preprocessed data - make sure you are aware of what kind of preprocessing has been performed on your data
- Analysis ready data (ARD)
- User experience and knowledge
- Appropriate background knowledge required for many tasks
- ARD is "ready to go", but be aware of what preprocessing has been performed on your data
Some widely used EO datasets
|Name||Max resolution, m||Revisit time, days||Years of operation||Open data|
|Planet, several satellites||0.5-5||-||2009->||No*|
|ESA, Sentinel 1||5||6||2014->||Yes|
* See Planets page for education and research for limited, non-commercial access to PlanetScope and RapidEye imagery.
Database of all EO missions and instrument information can be found in the CEOS EO handbook database. See also EOReader band mapping graphics for an overview of observed wavelength intervals for different optical sensors.
Where can I find the data?
Commercial datasets are usually available from data provider, while open datasets may be available in different processing stages from different services. Where possible, it might be a good idea to check processing options close to the data, for direct access or faster download. While graphical browse and download services can provide a good overview of the data and are easy to use, the download of huge amounts of data gets considerably easier using a bulk downloader or download API (Application Programming Interface).
Many data providers provide a Spatio Temporal Asset Catalog (STAC) of their datasets. These catalogs help in finding available data based on time and location with the possibility for multiple additional filters, such as cloud cover and resolution. The STAC Index provides a nice overview of available catalogs from all over the world, including Paituli STAC. The STAC Index page also includes many resources for learning and utilizing STAC. Check out also CSC's examples for utilizing STAC from Python and examples for utilizing STAC from R.
EO data at CSC
Some Finnish EO datasets are available locally at CSC. A STAC catalog for all spatial data available at CSC is currently in progress. You can find more information about it and its current content from the Paituli STAC page.
- Sentinel and Landsat mosaics of Finland in Puhti. Accessing data in Puhti requires CSC user account with a project where Puhti service is enabled. All Puhti users have read access to these datasets. You do not need to move the files: they can be used directly, unless you need to modify them, which requires you to make your own copy.
- Sentinel-2 L2A data of Finland in Allas. These files are public, so anybody can download them, also from own computer or other services.
- More information and list of all spatial datasets in CSC computing environment
EO data download services
SYKE/FMI, Finnish image mosaics : Sentinel-1, Sentinel-2 and Landsat mosaics, for several time periods per year. Some of them are available in Puhti, but not all. FMI provides also a STAC catalog for these mosaics
European Space Agency's SciHub provides worldwide main products for Sentinel-1, -2 and -3. It requires free registration. Big part of the data is in the "Long term archive" and cannot be downloaded directly, but needs to be requested first (some tools can do that automatically, please check the documentation for the tool of your choice). Download is limited to 2 concurrent processes per user. Please note that between end of January and July 2023 this system is being updated to become the Copernicus Space Ecosystem. SciHub will continue its full operations until the end of June 2023.
Copernicus Data Space Ecosystem provides access to all Sentinel data with new features for visualisation and data processing. Please stay tuned to the news for latest information on the services available and the Copernicus Data Space Ecosystem roadmap for the full release of all functionalities.
FinHub is the Finnish national mirror of SciHub; other national mirrors also exist. It covers Finland and the Baltics and offers Sentinel-2 L1C (but not L2A) and Sentinel 1 SLC, GRD and OCN products and requires own registration. Finhub does not have concurrent download limitations nor a "Long term archive".
Both of the above provide a similar Graphical User Interface (GUI) and Application Programming Interface (API) to access the data. You can also use for example the sentinelsat tool for downloading data from ESA open access hubs. See also CSC examples for SciHub and FinHub data download.
USGS EarthExplorer provides among others US related datasets, also worldwide Landsat mission datasets. It requires free registration. Data can be browsed and downloaded via web interface and bulk download. USGS is the main provider of the new Landsat Collection 2 data.
Amazon Web Service (AWS) open EO data is a collection of worldwide EO datasets provided by different organizations, including Landsat and Sentinel. Some of the data can be downloaded only on "requestor pays" basis. Currently, Sentinel-2 L2A Cloud-optimized Geotiffs are available for free, also via STAC.
Microsoft planetary computer provides a STAC of all available data, which includes Sentinel, Landsat, MODIS. It is currently available in preview.
Terramonitor provides pre-prosessed, analysis ready Sentinel-2 data from Finland available between 2018-2020. It is a commercial service.
Other geospatial datasets
To find other geospatial datasets, check out CSC open spatial dataset list.
How can I process EO data at CSC?
You can find information about geocomputing using CSC resources and how to get started on CSC geocomputing pages, including links to creating user accounts and all other practical information.
What to consider when choosing a software?
There is no single software perfect for every task and taste. The right software depends as much on the task to be worked on, as on the taste and skillset of the user. The following list sunmmarizes things that need to be considered when choosing a software.
- Functionality: Does the software provide the tools you need to reach your goal?
- Interaction type: How do you want to interact with the software?
- Graphical User Interface (GUI)
- Command Line Interface (CLI)
- Technical aspects:
- Reproducibility: Does the tool provide the possibility to record work steps?
- Supported operating systems: Can the tool be installed to the operating system available to you?
- Automation possibility: Can the tool execution be automatized for big data processing, if needed?
- Combination possibility: Can you combine the tool with other tools?
- Computational efficiency: Does the tool make good use of the available computational resources (especially GPUs)?
- Support for parallel computing or batch processing
- Open source vs proprietary
- Proprietary tools need licenses which may be expensive and/or limiting the use of the tool
- FOSS (free and open source software) allows the user to inspect the source code and provide high level insights in its functionality
What applications are available on Puhti?
GDAL (OGR) - Geospatial Data Abstraction Library. Collection of command-line tools for accessing and transforming geospatial data. It is relatively fast and requires little computational resources. GDAL supports reading data directly from the Internet or object storage. GDAL is included in many other tools for data reading and writing. GDAL example for Puhti
Matlab - you can run Matlab jobs on Puhti conveniently from your own computers Matlab installation.
Orfeo Toolbox (OTB) - offers a wide variety of applications from ortho-rectification or pansharpening, all the way to classification, SAR processing, and much more. Orfeo Toolbox is available as CLI, GUI and via Python interface.
- The geoconda module provides many useful Python packages for raster data processing and analysis, such as
xarrayand tools for working with STAC.
- Machine learning modules provide some common machine learning frameworks, also for deep learning..
QGIS - open source tool with GUI for working with spatial data including limited multispectral image processing capabilities. GUI with batch processing possibility and Python interface. Used for example for visualization, map algebra and other raster processing. Many plug-ins available, for EO data processing, check out the QGIS Semi-automatic classification plugin.
R - Puhti R installation includes a lot of geospatial packages, including several useful for EO data processing, such as
rstac for working with STAC catalogs.
Sen2Cor - a command-line tool for Sentinel-2 Level 2A product generation and formatting.
Sen2mosaic - a command-line tool to download, preprocess and mosaic Sentinel-2 data.
SNAP - ESA Sentinel Application Platform. Tool for processing of Sentinel data (+ support for other data sources). GUI, CLI (Graph Processing Tool, GPT) and Python interfaces. SNAP GPT example for Puhti.
If you need further applications, you can ask CSC to install them for you.
Machine Learning with EO data
One example of the advanced usage of EO data is for machine learning. If you are interested in the topic, you can find a lot of examples from CSC machine learning with spatial data course materials. For practical guidelines, see also CSC machine learning guide
Alternative processing services
Below is a list of alternative EO processing services that might be useful, when a lot of data is required and downloading it all to CSC might not be feasible.
Microsoft planetary computer offers JupyterHub together with Dask Gateway, both CPUs and GPUs are available. It is currently available in preview.
Data and Information Access Services (DIAS) offer cloud based Virtual Machines (VMs), dedicated baremetal servers, containers, operating system and software images. These services are specialized in EO and have user support available. All of them are commercial services. The new Copernicus Data Space Ecosystem will combine some of the DIASes into one, including then also free trials of the service. See the Copernicus Data Space Ecosystem roadmap for the full release of all functionalities.
Sentinelhub is a commercial service that offers several different APIs.
Commercial clouds: Amazon, Google Cloud and Microsoft Azure, all provide virtual machines and other processing services, all of them have some local data, see links above.
Where can I get help?
If you are interested in using CSC services for your EO research, please make yourself familiar with the services:
- Visit a course, seminar or workshop; you can find all upcoming and past events in the CSC training calendar
- For getting started, go through CSC Computing Environment - Self Learning course
- Find information about services and how to use them in CSC's documentation pages
- For information on geocomputing in CSC environment, checkout the collection of CSC's geocomputing learning materials and CSC geocomputing examples on Github
You can find all the ways that you can get help from CSC specialists via CSC contact page. We are happy to help with technical problems around our services and are open for suggestions on which software should be installed to Puhti, or what kind of courses should be offered or materials/examples should be prepared. Please also let us know, if you would like to add a service to this page or find anything unclear.
If you find any mistakes or outdated links, have improvement suggestions or want to add more information about a certain topic, please add them to our Github issue for improving the EO guide, send a pull request to our CSC documentation on github or contact us via any of the ways mentioned in CSC contact page. Thank you!
Resources and further reading
If you are interested in the fundamentals of EO, take a look at these excellent resources:
- Fundamentals of remote sensing tutorial by Canada Centre for Mapping and Earth Observation , Natural Resources Canada; an "interactive module is intended as an overview at a senior high school or early university level and touches on physics, environmental sciences, mathematics, computer sciences and geography."
- Echoes in space - Introduction to RADAR remote sensing by the European Space Agency; "a detailed insight into the history of Radar technology, including all the basics that are needed to understand how electromagnetic waves work and a unique hands-on experience to work with Radar data in diverse application scenarios."
- Newcomers guide to Earth Observation by the European Space Agency, "a guide to help non-experts in providing a starting point in the decision process for selecting an appropriate Earth Observation (EO) solution."
- Earthdatascience intro to multispectral data
- CSC geocomputing seminar materials, especially materials of the 2022 EO-workshop
- ESA tutorials
- Awesome EO code, long list of EO tools
- Overview of big EO data management and analysis platforms (from 2020)