Getting started \ CNES Topics
At CNES, there are several awesome services that will help you to either develop or contribute to a library,
launch processing chains at every scale, access to massive datasets or even put your code into production.
This page gives an overview of the most important ones and links to some documentation and tips.
Prerequisite: Scientific Information System account creation
First thing first: some of these services are accessible with a default enterprise account (default to CNES agent, not necessarily to contractors), but many also require to have a SIS account, that must be asked by a CNES agent through our enterprise ticketing system.
Software factory
Any development project needs a version control system and associated Continuous Integration (CI). At CNES a full team is dedicated to provide such an environment built on top of Gitlab and Artifactory for the most part. Full documentation of all this tooling can be found here.
Gitlab and CI
Gitlab is a service accessible to any people with an enterprise or SIS account. It is accessible at: https://gitlab.cnes.fr from inside or outside CNES facilities. From outside, there are some tricks to follow to properly access this gitlab with SSH protocol. From inside CNES network, this is just gitlab as you now it.
Continuous Integration is now handled by gitlab runners. It's pretty well documented in the Software factory documentation, just beware, for the time being, default shared runners do not have access to Computing Center environment, you'll need a project runner for this. There is one available for Data Campus division.
Artifactory: Package repository
The Artifactory package repository enables two things:
- Proxying pip, conda, docker and other public packages repository, allowing fast and easy access to these repo from CNES infrastructures
- Storing your own packages built with those technologies for internal sharing.
When working from a CNES infrastructure like our HPC system, the first thing is often to generate credentials for automatically using this service, especially when working with Python environments or Docker.
See the Pluto Trex tip explaining this shortly, more information can be found in Software factory documentation.
Computing Center
CNES Computing center offers a set of coherent services around computing and data processing at scale. It comprises an HPC (or HPDA) system, an associated Datalake, and Jupyterhub service on top of it for interactive development. These services are described in Confluence, and HPDA part also have its own complete documentation.
Datalabs for interactive analysis
The Datalabs is a custom Jupyterhub, enabling users with a SIS account to access computing power through Jupyter notebooks. It is based on the Pangeo stack, and is accessible at
- https://jupyterhub.sis.cnes.fr from inside CNES network,
- https://jupyterhub.cnes.fr from outside CNES network.
A complete documentation is also available from our Confluence instance.
It comes with prebaked Virtual Research Environment (VRE) which contains all the necessary libraries and toolset for easy development experience. But you can also use it with your own environments.
HPDA Platform: Trex
Trex is a modern HPC system, with boosted storage and GPU access, which makes it a High Performance Data Analitycs (HPDA) system. It provides access to computing resources through Slurm scheduling system, and a Spectrum Scale storage platform.
It can be accessed with an SIS account through the Datalabs, or using standard SSH access, optionnaly using VNC protocol, at:
- trex.sis.cnes.fr from inside cnes network,
- trex.cnes.fr from outside.
Other HPC platforms exist for more confidential data and algorithms.
Datalake storage platform
Three tiers of storage can be found at CNES:
- Hot storage, which is the one directly provided by the HPDA platform, based on Spectrum Scale technology with NVMe and spinning drives,
- Warm storage, based on S3 protocol on spinning hard drives,
- Cold storage, also accessible with S3 protocol, but with tapes backend.
The Datalake infrastructure provides second and third spaces above, with a unified S3 API. Main documentation of this service can be found on our Confluence instance.
It is mainly used to store public collections like a mirror of Sentinel 1 and 2 datasets from Copernicus, and also all the data produced by our own production centers, like Hydrology products (Surf Water, Let it Snow) which we talk about below.
On site data collections
CNES hosts several data collections. They are made available to the public through public portals described at Data Hub page, mainly Theia, Geodes, and Hydroweb.next.
When working on CNES infrastructure, it is important to note that every product hosted at CNES are directly accessible to users through standard POSIX or S3 interfaces. The S3 paths for every product will soon be available directly in the associated catalogs. Meanwhile, you can find products on Datalake S3 buckets:
flatsim
hydroweb
hysope2-cog
muscate
postel
sentinel1-grd
sentinel1-l2b-sw-single-sprid
sentinel1-l3b-sw-monthly-sprid
sentinel1-l3b-sw-yearly-sprid
sentinel1-ocn
sentinel1-s2l1c-sprid
sentinel1-slc
sentinel2-l1b
sentinel2-l1c
sentinel2-l2a-grs-sprid
sentinel2-l2a-peps
sentinel2-l2a-sprid
sentinel2-l2b-obs2co-sprid
sentinel2-l2b-snow-sprid
sentinel2-l2b-sw-single-sprid
sentinel2-l3a-sprid
sentinel2-l3b-snow-sprid
sentinel2-l3b-sw-monthly-sprid
sentinel2-l3b-sw-yearly-sprid
sentinel3-sral
sentinel6-l1a
swh-l1a
take5
On HPC system storage, there is also a /work/datalake
hosting some interesting data. One really useful is
/work/datalake/static_aux/
, with predownloaded auxiliary data like MNT.
Other services
CNES IT teams also provide a lot of other services which can be useful, but more when developing a full service or a production center:
- Virtual machines,
- Mutualised Kubernetes cluster,
- NAS Storage,
- Management tools like Confluence and JIRA.
And plenty others.