Skip to content

Getting started \ Data access

Data access

The first thing when working with Satellite or geospatial data in general is to be able to read it properly, which means not only reading the values of a raster for example, but all other informations like coordinates or metadata. While there are reference file formats for EO data (and in other domains), there are no real standard for the associated numerous metadata and how to store them, which often makes it complicated to deal with them.

Reference formats for raster are for example:

  • GeoTIFF: a public domain metadata standard that enables georeferencing information to be embedded within an image file. Only 2 dimensional rasters.
  • JPEG2000: which is more a compression standard at first, based on wavelet transform.
  • NetCDF: Not only for raster, and often for 3D or 4D climate datasets.

Even if those files contain metadata, like their reference Coordinate Reference System, they are most often delivered with other XML or Json files containing other metadata describing the product or the sensor conditions.

There are also reference formats for vector data like points and shapes:

  • Shapefile: a Esri vector data file format commonly used for geospatial analysis.
  • GeoJson: a format for encoding a variety of geographic data structures.
  • KML: a file format used to display geographic data in an Earth browser such as Google Earth.

Below is a selection of tools and useful links to start working with satellite data.

Low level libraries (open satellite images in a generic way)

Here, we are dealing only with the geospatial file formats and the metadata they contains, so basic spatial metadata.

  • GDAL: THE reference tool for reading, writing, and transforming Geospatial data. A great deal of other tools rely on it. Writen in C, it also provide a Python API.
  • Rasterio because GDAL is not easy to use from Python, RasterIO provides a nice Python API to GDAL and also other useful extensions.

Advanced libraries

As mentioned in the introduction, reading a single file is often not enough to have enough information on a product, or a high level representation of it in memory. The following tools gives much more functionnalities when it comes to some specific Satellite product access.

  • Sensorsio: a python library that provides convenient functions to load Sentinel2 (Level 2A, MAJA format) and other sensors data into numpy and xarray.
  • EOReader: a remote-sensing opensource python library reading optical and SAR constellations, loading and stacking bands, clouds, DEM and spectral indices in a sensor-agnostic way.
  • EOSets: aims to simplify any process working with sets of EO data handled by EOReader, meaning temporal or spatial stack.
  • Orfeo Toolbox: also reads plenty of different sensors (optical or SAR imagery), listed here

Data provider facilities

Before reading the data, you must be able to look for it, and find it! Geospatial data is often distributed and cataloged by Data Hubs in a standard way using standards like STAC or Opensearch. You'll need to query those catalogs to find data of interest, and then either download the products, or if working on a facility like the CNES computing center, find data path and access products directly.

Following tools can help you with that:

  • EODag: (Earth Observation Data Access Gateway) is a command line tool and a Python package for searching and downloading remotely sensed images while offering a unified API for data access regardless of the data provider
  • PySTAC is a library for working with SpatioTemporal Asset Catalogs (STAC) in Python 3
  • See also the data hub pages.