Setting Up Your Dataset#
climpred relies on a consistent naming system for
xarray dimensions.
This allows things to run more easily under-the-hood.
PredictionEnsemble expects at the minimum to contain dimensions
init and lead.
init is the initialization dimension, that relays the time
steps at which the ensemble was initialized.
init is known as forecast_reference_time in the CF convention.
init must be of type pandas.DatetimeIndex, or
xarray.CFTimeIndex.
If init is of type int, it is assumed to be annual data starting Jan 1st.
A UserWarning is issues when this assumption is made.
lead is the lead time of the forecasts from initialization.
lead is known as forecast_period in the CF convention.
lead must be int or float.
The units for the lead dimension must be specified in as an attribute.
Valid options are ["years", "seasons", "months"] and
["weeks", "pentads", "days", "hours", "minutes", "seconds"].
If lead is provided as pandas.Timedelta up to "weeks", lead
is converted to int and a corresponding lead.attrs["units"].
For larger lead as pandas.Timedelta
["months", "seasons" or "years"], no conversion is possible.
valid_time=init+lead will be calculated in PredictionEnsemble upon
instantiation.
Another crucial dimension is member, which holds the various ensemble members,
which is only required for probabilistic metrics. member is known as
realization in the CF convention
Any additional dimensions will be broadcasted: these could be dimensions like lat,
lon, depth, etc.
If the expected dimensions are not found, but the matching CF convention
standard_name in a coordinate attribute, the dimension is renamed to the
corresponding climpred ensemble dimension.
Check out the demo to setup a climpred-ready prediction ensemble
from your own data or via
intake-esm from CMIP DCPP.
Verification products are expected to contain the time dimension at the minimum.
For best use of climpred, their time dimension should cover the full length of
init and be the same calendar type as the accompanying prediction ensemble.
The time dimension must be pandas.DatetimeIndex, or
xarray.CFTimeIndex.
time dimension of type int is assumed to be annual data starting Jan 1st.
A UserWarning is issued when this assumption is made.
These products can also include additional dimensions, such as lat, lon,
depth, etc.
See the below table for a summary of dimensions used in climpred, and data types
that climpred supports for them.
Short Name |
Types |
Long name |
Attribute(s) |
|
|---|---|---|---|---|
|
|
lead timestep after initialization |
|
units (str) [ |
|
initialization as start date of experiment |
|
None |
|
|
|
ensemble member |
|
None |
Probably the most challenging part is concatenating
(xarray.concat()) raw model output with dimension time of
multiple simulations to a multi-dimensional xarray.Dataset containing
dimensions init, (member) and lead, where time becomes
valid_time=init+lead. One way of doing it is
climpred.preprocessing.shared.load_hindcast().