3. Experiment configuration
Each experiment directory includes a config.yml file, which defines the configuration of the experiment.
3.1. Global Options
The global-options section of config.yml specifies options that should, by default, apply to every module (but can be overwritten by module-level configuration). Common global options include:
nsamps: number of samples to run
scenario: climate scenario to use
pyear_start: year in which to start projection
pyear_step: step size between projection years
pyear_end: year in which to end projections
baseyear: zero point for sea level projections
pipeline_file: default pipeline file name (optional; defaults to ‘pipeline.yml’). See pipeline_file under experiment step configuration.
In addition, this section can specify:
rcfg-name: the name of the resource configuration file to be used. FACTS will look in the resources/ directory for file with name resource_(rcfg_name).yml. (If not specified, resource.yml will be used.)
Global options can be overwritten when runFACTS.py is invokved from the command line by passing a dictionary along with the --global_options
parameter; for example:
python3 runFACTS.py --global_options '{"rcfg-name": "localhost" }' experiments/dummy/
3.2. Experiment step and module configuration
All other top-level entries in config.yml specify experiment steps. The label of these are arbitrary, though common labels are climate_step, sealevel_step, totaling_step and esl_step.
The second-level entries under the experiment steps specify the modules to be run. Module labels are arbitrary, and are used together with the experiment name to label output files.
The following third-level entries are used under the module label:
module_set (required): the name of the module set (first level hierarchy in the modules/ directory)
module (required): the name of the module (second level hierarchy in the climate directory)
options: Entries under this header are passed as options to the modules if they match options defined in the module pipeline.yml. These overwrite global options.
options_allowoverwrite: Entries under this header are passed as options to the modules if they match options defined in the module pipeline.yml. These can be overwriten by global options. Options that make use of magic variables should be specified here. A particularly useful example is the specification
climate_data_file: "%CLIMATE_DATA_FILE%"
, which allows the identified of the climate data file generate in the climate stage to be passed onto sea level modules that use it as input.input_data_file: Identifies a data file to be uploaded into the sandbox from the input/ subdirectory of the experiment directory.
input_compressed_data_file: Identifies a .tgz file to be uploaded into the sandbox from the input/ subdirectory of the experiment directory and extracted.
stages: Specifies stages from the pipeline.yml file to be run. Defaults to ‘preprocess’, ‘fit’, ‘project’, ‘postprocess’.
pipeline_file: Pipeline file name. Defaults to ‘pipeline.yml’. Alternatives can be useful for special cases (e.g., using only Antarctic ice sheet output from a module that produces both Greenland and Antarctic ice sheet output.) Many modules include a ‘pipeline.global.yml’ file for running the module without producing localized output.
include_in_workflow: A list of all workflows the module output should be included in at the totaling steps.
loop_over_workflows: If defined, replicate the module for all workflows defined to date.
loop_over_scales: If defined, replicate the module for both global and local scale (e.g., for a totaling module).
generates_climate_output: Module generates climate output data. Globally set
`climate_output_data`
based on the module’s pipeline configuration.climate_output_data, global_total_files, local_total_files, totaled_files: See definitions in pipeline configuration file specification. Can be used together with the facts/dummy module and input_data_file to upload outputs produced by prior Experiment Steps and have them filed in an appropriate directory for subsequent modules to find.
3.2.1. Example experiment file
global-options:
nsamps: 2000
scenario: ssp585
pyear_start: 2020
pyear_end: 2150
pyear_step: 10
baseyear: 2005
climate_step:
temperature:
module_set: "fair"
module: "temperature"
generates_climate_output: true
# input_data_file:
# - "emissions.csv"
# options:
# rcmip_file: emissions.csv
sealevel_step:
GrIS1f:
module_set: "FittedISMIP"
module: "GrIS"
include_in_workflow:
- "wf1f"
- "wf2f"
- "wf3f"
deconto21:
module_set: "deconto21"
module: "AIS"
include_in_workflow:
- "wf3e"
- "wf3f"
bamber19:
module_set: "bamber19"
module: "icesheets"
include_in_workflow:
- "wf4"
emuAIS:
module_set: "emulandice"
module: "AIS"
include_in_workflow:
- "wf1e"
options:
pyear_end: 2100
emuGrIS:
module_set: "emulandice"
module: "GrIS"
include_in_workflow:
- "wf1e"
- "wf2e"
- "wf3e"
options:
pyear_end: 2100
emuglaciers:
module_set: "emulandice"
module: "glaciers"
include_in_workflow:
- "wf1e"
- "wf2e"
- "wf3e"
options:
pyear_end: 2100
larmip:
module_set: "larmip"
module: "AIS"
include_in_workflow:
- "wf2e"
- "wf2f"
ar5glaciers:
module_set: "ipccar5"
module: "glaciers"
options:
gmip: 2
include_in_workflow:
- "wf1f"
- "wf2f"
- "wf3f"
- "wf4"
ar5AIS:
module_set: "ipccar5"
module: "icesheets"
pipeline_file: "pipeline.AIS.yml"
include_in_workflow:
- "wf1f"
ocean:
module_set: "tlm"
module: "sterodynamics"
include_in_workflow:
- "wf1f"
- "wf1e"
- "wf2e"
- "wf2f"
- "wf3e"
- "wf3f"
- "wf4"
k14vlm:
module_set: "kopp14"
module: "verticallandmotion"
include_in_workflow:
- "wf1f"
- "wf1e"
- "wf2e"
- "wf2f"
- "wf3e"
- "wf3f"
- "wf4"
lws:
module_set: "ssp"
module: "landwaterstorage"
options:
scenario: "ssp5"
dcrate_lo: -0.4
include_in_workflow:
- "wf1f"
- "wf1e"
- "wf2e"
- "wf2f"
- "wf3e"
- "wf3f"
- "wf4"
totaling_step:
total:
module_set: "facts"
module: "total"
loop_over_workflows: true
loop_over_scales: true
stages:
- workflow
esl_step:
extremesealevel:
loop_over_workflows: true
module_set: "extremesealevel"
module: "pointsoverthreshold"
options:
target_years: 2050,2100
total_localsl_file: "$SHARED/totaled/%EXPERIMENT_NAME%.total.workflow.%WORKFLOW_NAME%.local.nc"
4. Pipeline configuration
The top level entries in the pipeline configuration identify the stages of the module. Most commonly, these are ‘preprocess’, ‘fit’, ‘project’ and ‘postprocess’, but any labels are permitted. If other than these four, the stages must be specifically identified in the config.yml file.
The next level entry should be “task1:”
The third level entry defines the task, including the executable to be run, parameters to be passed to it, and associated files. Entries include:
executable (required): The name of the executable to run (e.g., ‘python3’).
upload_input_data: Files to be uploaded prior to run. If path is not specified, looks in the module directory.
upload_and_extract_input_data: .tgz files to be uploaded and extracted prior to run. If path is not specified, looks in the modules-data directory.
- cpu: The start of an entry defining computational requirements. Fields are:
cpu_processes
cpu_process_type
cpu_threads
cpu_thread_type
python_dependencies: Python modules to be installed via pip prior to execution, contained as a single string separated by spaces. (Only needs to be included for the first stage in a module, as these will persist to subsequent changes.)
script: The name of the script file to run with executable. Will be uploaded from the module directory.
script_noupload: The name of the script file to run with executable, but do not look in the module directory for the file to upload. (Useful if script is in a different location.)
arguments: Arguments to be passed to the script file.
options: A list of option names to be passed to the script file if their value is specified in config.yml.
pre_exec: Any commands, not otherwise specified, to be run before execution.
copy_input_data: A hierarchy (next level being stage name, subsequent level being ‘task1’, third level being file names) of files to copy from previous stages.
copy_shared_data: A list of data to be copied from a shared storage area for this experiment.
climate_output_data: A list of climate output data to be coped to a shared (cross-module) directory for use in subsequent stages and downloaded.
global_total_files: A list of global output files to be copied to a shared (cross-module) directory for use in the totaling stage and downloaded.
local_total_files: A list of local output files to be copied to a shared (cross-module) directory for use in the totaling stage and downloaded.
totaled_files: A list of total sea level files to be copied to a shared (cross-module) directory for use in post-totaling stage and downloaded.
copy_output_data: A listing of output files, not otherwise specified, to be copied to a shared (cross-module) directory for subsequent use.
download_output_data: A listing of additional output files to be downloaded.
4.1. Example pipeline.yml file
preprocess:
task1:
executable: "python3"
python_dependencies: "numpy scipy netCDF4 pyyaml matplotlib"
script: "bamber19_preprocess_icesheets.py"
options:
- "pipeline_id"
- "pyear_start"
- "pyear_end"
- "pyear_step"
- "scenario"
- "baseyear"
upload_and_extract_input_data:
- "bamber19_icesheets_preprocess_data.tgz"
fit:
task1:
executable: "python3"
script: "bamber19_fit_icesheets.py"
options:
- "pipeline_id"
project:
task1:
executable: "python3"
script: "bamber19_project_icesheets.py"
options:
- "nsamps"
- "seed"
- "replace"
- "pipeline_id"
copy_input_data:
preprocess:
task1:
- "%PIPELINE_ID%_data.pkl"
global_total_files:
- "%PIPELINE_ID%_GIS_globalsl.nc"
- "%PIPELINE_ID%_AIS_globalsl.nc"
download_output_data:
- "%PIPELINE_ID%_EAIS_globalsl.nc"
- "%PIPELINE_ID%_WAIS_globalsl.nc"
postprocess:
task1:
executable: "python3"
script: "bamber19_postprocess_icesheets.py"
options:
- "locationfile"
- "pipeline_id"
upload_input_data:
- 'read_locationfile.py'
- 'AssignFP.py'
- 'ReadFingerprint.py'
upload_and_extract_input_data:
- 'bamber19_icesheets_postprocess_data.tgz'
copy_shared_data:
- '$SHARED/location.lst'
copy_input_data:
project:
task1:
- "%PIPELINE_ID%_projections.pkl"
local_total_files:
- "%PIPELINE_ID%_GIS_localsl.nc"
- "%PIPELINE_ID%_AIS_localsl.nc"
download_output_data:
- "%PIPELINE_ID%_WAIS_localsl.nc"
- "%PIPELINE_ID%_EAIS_localsl.nc".
5. Magic variables
The following magic variables can be used in options specified in the config.yml and pipeline.yml files.
|
|
|
Workflow Name |
|
Name of scale (global, local) |
|
Module set name |
|
Module name |
|
Module path |
|
Climate data file produced by climate step, to be used as input (includes both GSAT and OHC) |
|
GSAT data file produced by climate step |
|
OHC data file produced by climate step |
|
Experiment path |
|
Experiment name |
In addition, the variable $SHARED, used in a file path, directs scripts to a shared storage area in the sandbox.