- Table of contents
- Introduction
- Installation on Météo-France HPC systems
- How to use CrocO
- Simulation outputs
Introduction¶
General Introduction¶
Snowpack models suffer for large errors and uncertainties which limit their use, in particular for spatialised applications. CrocO is an ensemble data assimilation system designed to tackle this issue. In this framework, an ensemble of models quantifies snowpack modelling errors. These errors are reduced by assimilating snowpack observations, using a Particle Filter (PF). Several innovative versions of the PF are developed within CrocO, to solve for spatialisation issues [1].
Spatialised applications of large ensembles are computationally intensive, and require the parallelization of the ensemble members and an optimization of resource fluxes. For this reason, CrocO is tailored to Météo-France’s research HPC system (beaufix/belenos-hendrix).
This page provides a quick description of Croco assimilation sequence, a guide for installation on Météo-France HPC system and a user guide to launch Croco simulations.
For a technical documentation (Code version, main technical developments, new files and options), have a look at CrocO_technical_doc.
Finally, a guide for developers can be downloaded for further details on the implementation.
CrocO assimilation sequence¶
CrocO is a sequential data assimilation system: observations are assimilated date after date, as the ensemble advances over time.- observation files and dates are known and prepared beforehand
- an ensemble of simulations (OFFLINE executables) is launched between each observation dates using ESCROC ensemble version of Crocus snowpack model (Lafaysse et al.; 2017).
- on each observation date, a Particle Filter (SODA executable) is used to correct the ensemble simulation with the observations (see Fig. 1).
On the example on Fig. 1, ESCROC ensemble is used to propagate 3 particles (full state vector of the ensemble members) until observation date t1. At t1, the Particle Filter resamples the particles, in order to bring the ensemble closer to the observation. The ensemble is then initialized at t1 with these new initial states and launched until next observation date (t2).
More details¶
- Snowpack modelling errors are accounted for by combining a meteorological ensemble (stochastic perturbations) with an ensemble of snowpack models (ESCROC) as described in [2] and Fig.2:
- a run is a unique combination of a forcing F* and an ESCROC configuration M*
The configuration F* - M*, is fixed during the experiment
- all the runs are initialized by the same spinup X_0
- the total number of runs is defined by the parameter nmembers
- nmembers also defines the number of different ESCROC configurations M*
- nforcing (<=nmembers) defines the number of different forcings F* to use
If nforcing < nmembers, the forcings are repeated until all runs have a forcing.
- In spatialised applications, the PF has to ingest a large number of observations, which leads the PF to replicate only one particle, an issue called degeneracy (Snyder et al., 2009). Within SODA two alternatives are developed to tackle this issue : inflation of the observation errors (inspired by Larue et al., HESS; 2018) and k-localisation which uses the ensemble background correlation patterns to localise the Particle Filter.
A thorough description of CrocO is given in [1].
A guide for developers is in progress.
References¶
- [1]. Cluzet, B., Lafaysse, M., Cosme, E., Albergel, C., Meunier L.-F., and Dumont, M. CrocO_v1.0 : a Particle Filter to assimilate snowpack observations in a spatialised framework, (submitted) https://doi.org/10.5194/gmd-2020-130.
- Deschamps-Berger et al., (in prep)
- Revuelto et al., (in prep)
- [2]. Cluzet, B., Revuelto, J., Lafaysse, M., Tuzet, F., Cosme, E., Picard, G., Arnaud, L., and Dumont, M.: Towards the assimilation of satellite reflectance into semi-distributed ensemble snowpack simulations, Cold Regions Science and Technology, 170, 102 918, 2020.
Installation on Météo-France HPC systems¶
Dependencies¶
CrocO depends on several open-source codes distributed by the CNRM (Centre National de Recherches Metéorologiques) using git :- OFFLINE and SODA executables are embedded within SURFEX fortran modelling platform
https://redmine.umr-cnrm.fr/projects/surfex_git2 - snowtools_git code (python) handles the user interface, and some pre-post processing tools
https://redmine.umr-cnrm.fr/projects/snowtools_git - vortex (python) is used to embed CrocO in Météo-France's HPC system (beaufix/belenos supercomputer and hendrix archive)
https://redmine.umr-cnrm.fr/projects/vortex
More details on these libraries can be found in the technical documentation : CrocO technical doc
- Optionally, CrocO_toolbox features tools to prepare the observations, pre/post-process simulations and locally launch CrocO (use of Météo-Frace HPC system is however highly recommended as simulations are quite computationally expensive).
https://github.com/bertrandcz/CrocO
Prerequisites (not exclusive):¶
- access to beaufix/belenos and hendrix
- access to SURFEX code
- read snowtools_git wiki page
[[https://redmine.umr-cnrm.fr/projects/snowtools_git/wiki/]] - be familiar with Crocus snowpack model and its ensemble version ESCROC (Lafaysse et al. 2017):
Multiphysics - if you're going to assimilate reflectances (Tuzet et al., 2017):
Explicit_representation_of_impurities - CrocO doesn't handle the generation of initial conditions and forcings. You need to generate (or use preexisting) an ensemble of forcings and a spinup.
Standard users only need to install SURFEX on beaufix/belenos. snowtools_git and VORTEX are already installed for them on beaufix/belenos.
Install SURFEX¶
- If you're not developping in SURFEX, download the cen branch directly on beaufix/belenos.
Otherwise, you'd better install it locally and synchronize your local modifications to belenos with rsync command (inspiring on rsync_SURFEX_V81_beaufix). - Compile it in NOMPI-O2 configuration (sequential ensemble application case in the following link) :
belenos :Compile_SURFEX_on_Belenos
Don't forget to frequently update your code version doing git pull in your code repository.
Set CrocO environment¶
Install (developers)¶
Developers in snowtools_git and VORTEX (for CrocO) must properly install snowtools_git and vortex (eventually with rsync inspiring on rsync_snowtools_git and rsync_vortex) following:
Install and Install_VORTEX
Install (standard users)¶
Standard users don't need to install VORTEX nor snowtools_git. They will use versions of snowtools_git and VORTEX maintained by the main developer.- To use snowtools and VORTEX, they just need to modify their .bash_profile on belenos :
# vortex export MTOOLDIR=$WORKDIR export VORTEX=$HOME/common/vortex/vortex-cen export PYTHONPATH=$VORTEX export PYTHONPATH=$PYTHONPATH:$VORTEX/bin export PYTHONPATH=$PYTHONPATH:$VORTEX/site export PYTHONPATH=$PYTHONPATH:$VORTEX/src export PYTHONPATH=$PYTHONPATH:$VORTEX/project # snowtools_git export SNOWTOOLS_CEN=$HOME/common/snowtools_git export PYTHONPATH=$PYTHONPATH:$SNOWTOOLS_CEN/snowtools alias s2m="python $SNOWTOOLS_CEN/snowtools/tasks/s2m_command.py"
- In order to upload files directly to hendrix and sxcen, standard users also must configure file transfers with hendrix archive and sxcen (see Install_VORTEX).
Setting new geometries (developers and standard users)¶
Any CrocO experiment should be associated with a given VORTEX geometry (location). In addition, they are used in the archive path and filenames. Some of them are defined in $VORTEX/conf/geometries.ini. If you need to define a new geometry (region), you must define it in a new file $HOME/.vortexrc/geometries.ini which contains the following lines. For example, Grandes-Rousses massif has a region_id of 12 and its region_name is grandes_rousses:
[region_id] info = Describe here your new geometry kind = unstructured area = region_name
Fetching simulation outputs to sxcen.cnrm (developers and standard users)¶
For post-processing and dev purposes, it is possible to fetch simulation outputs directly to sxcen.cnrm, in the NO_SAVE directory (see --writesx argument in CrocO_user_doc), thus avoiding to manually download them from hendrix.
In order to configure that, build the following symbolic link on sxcen.cnrm:
cd /cnrm/cen/users/NO_SAVE/ mkdir <username> cd <username> mkdir vortex ln -s /cnrm/cen/users/NO_SAVE/<username>/vortex/ $HOME/vortex
How to use CrocO¶
Once the installation is done, you're almost ready to perform a CrocO experiment on belenos/taranis, Météo-France's super computer.
CrocO experiments are launched by calling s2m command on belenos/taranis. Before looking at this command, let's see what it can do, and how to prepare an experiment.
Definition of a CrocO experiment¶
3 different types of experiments can be launched with CrocO:
- openloop : no assimilation. Useful as a reference or to generate synthetic observations
- synthetic : assimilation of synthetic observations (e.g. from a previous openloop run)
In that case, if the ensemble setup is the same as the openloop, you MUST remove the synthetic truth member from your ensemble (see below). - real : assimilation of real observations
Prepare an experiment¶
- initial conditions (PGD, and PREP), forcings ensemble and observations files : Remind that CrocO doesn't handle the generation of you need to generate it and archive it on hendrix beforehand.
Detailed instructions can be found in CrocO technical doc.
- namelist: Prepare a basic namelist. It will be used to configure the PF algorithm, and SURFEX I/O behavior. s2m will parse it and populate it with its relevant arguments (e.g. -b). This namelist will then be used as a mother namelist for ESCROC's multiphysics scheme (see Multiphysics).
- Carefully check the list of output variables in PRO files (see ex. below):
Bear in mind that the number of output variables considerably increases the time spent in writing (during computation) and transfer as well as storage needs.
- Carefully check the list of output variables in PRO files (see ex. below):
&NAM_WRITE_DIAG_SURFn CSELECT = 'time','ASNOW_VEG','TALB_ISBA','TS_ISBA','WSN_T_ISBA','DSN_T_ISBA','SNOWDZ','WSN_VEG','SPECMOD','SNOWSSA','SNOWIMP1','SNOWIMP2'
- You might also need to activate reflectance outputs (see. Explicit_representation_of_impurities).
- set the assimilation parameters, following changes_in_namelist in CrocO technical doc.
- assimilation configuration file: following CrocO technical doc, prepare a configuration file used to set the assimilation dates and eventually the ids of the ESCROC members to run.
If the ESCROC membersId are not specified in the configuration file (default cases for ESCROC subensemble E1*), they will be randomly drawn at the beginning of the simulation and written in a copy of the configuration file (which will be archived on hendrix to ensure traceability). You must specify it if you need to ensure reproducibility (for twin experiments for example).
Launch a CrocO experiment with s2m¶
CrocO assimilation sequence is ran through a single s2m command (snowtools_git command) on belenos/taranis.
First of all, if you're not familiar with s2m command and vortex read the following page :
Run_a_SURFEX-Crocus_experiment_without_vortex
Among them, you will need only the following. Optional are in ():
-m safran : set -m to safran. -r <region_id> : your geometry, consistent with your vortex path and PGD filenames. -b -e -f <xpid_forcing> or <xpid_forcing@username> : vortex xpid where the forcings are stored on hendrix -o <xpid> : name of the repertory where the outputs will be stored -n <path_to_your_namelist> : default namelist is not provided (-x <yyyymmddhh> : date of the spinup PREP (if it is not equal to -b))
A few supplementary arguments are necessary to run CrocO.
--croco=<your_path_to_assimilation_configuration_file> : give the path to the assimilation configuration file. --escroc=<escrocsubensemble> : specify the ESCROC subensemble to use ("E1tartes", "E1notartes", "E2") --obsxpid= <xpid_obs> or <xpid_obs@username> : vortex xpid where observations are stored on hendrix --nmembers=N : Number of members to run/draw among the subensemble --nforcing=Nf : Number of different forcings to use --nnodes : number of nodes on which to parallelize --walltime : estimate of the time duration of the parallelized experiment (minutes). Your simulation will be terminated past that duration --sensor : Name of the observations sensor/synthetical xp. (free, default is MODIS) --openloop : activate openloop mode OR --synth <mbid> : assimilation of synthetic data. Remove and replace the <mbid> member (synthetic truth) OR --real : assimilation of real data (--writesx : activate output to sxcen.cnrm in NO_SAVE/) (--grid : specify if your're performing gridded simulations)
Example :
s2m -n ~lafaysse/croco/OPTIONS_MOTHER_DEP.nam -r postes_12_csv -b 2013080106 -e 2014063006 -x 20160801 --escroc=E1notartes -o test0l --nmembers=35 --nforcing=35 --croco=~lafaysse/croco/conf.ini -f forcing_20132014B_31D_11_t1500_160@fructusm -m safran --real -s /home/cnrm_other/cen/mrns/lafaysse/SURFEX/cen/exe_mpi --obsxpid=obs@fructusm --sensor=12
Simulation outputs¶
Once the simulation has finished, simulation outputs are stored in the vortex path on hendrix.
(see Simulation outputs storing in CrocO technical doc)
Now it's up to you to post-process it !