Setting Up Your Dataset#
climpred
relies on a consistent naming system for
xarray dimensions.
This allows things to run more easily under-the-hood.
PredictionEnsemble
expects at the minimum to contain dimensions
init
and lead
.
init
is the initialization dimension, that relays the time
steps at which the ensemble was initialized.
init
is known as forecast_reference_time
in the CF convention.
init
must be of type pandas.DatetimeIndex
, or
xarray.CFTimeIndex
.
If init
is of type int
, it is assumed to be annual data starting Jan 1st.
A UserWarning is issues when this assumption is made.
lead
is the lead time of the forecasts from initialization.
lead
is known as forecast_period
in the CF convention.
lead
must be int
or float
.
The units for the lead
dimension must be specified in as an attribute.
Valid options are ["years", "seasons", "months"]
and
["weeks", "pentads", "days", "hours", "minutes", "seconds"]
.
If lead
is provided as pandas.Timedelta
up to "weeks"
, lead
is converted to int
and a corresponding lead.attrs["units"]
.
For larger lead
as pandas.Timedelta
["months", "seasons" or "years"]
, no conversion is possible.
valid_time=init+lead
will be calculated in PredictionEnsemble
upon
instantiation.
Another crucial dimension is member
, which holds the various ensemble members,
which is only required for probabilistic metrics. member
is known as
realization
in the CF convention
Any additional dimensions will be broadcasted: these could be dimensions like lat
,
lon
, depth
, etc.
If the expected dimensions are not found, but the matching CF convention
standard_name
in a coordinate attribute, the dimension is renamed to the
corresponding climpred
ensemble dimension.
Check out the demo to setup a climpred
-ready prediction ensemble
from your own data or via
intake-esm from CMIP DCPP.
Verification products are expected to contain the time
dimension at the minimum.
For best use of climpred
, their time
dimension should cover the full length of
init
and be the same calendar type as the accompanying prediction ensemble.
The time
dimension must be pandas.DatetimeIndex
, or
xarray.CFTimeIndex
.
time
dimension of type int
is assumed to be annual data starting Jan 1st.
A UserWarning is issued when this assumption is made.
These products can also include additional dimensions, such as lat
, lon
,
depth
, etc.
See the below table for a summary of dimensions used in climpred
, and data types
that climpred
supports for them.
Short Name |
Types |
Long name |
Attribute(s) |
|
---|---|---|---|---|
|
|
lead timestep after initialization |
|
units (str) [ |
|
initialization as start date of experiment |
|
None |
|
|
|
ensemble member |
|
None |
Probably the most challenging part is concatenating
(xarray.concat()
) raw model output with dimension time
of
multiple simulations to a multi-dimensional xarray.Dataset
containing
dimensions init
, (member
) and lead
, where time
becomes
valid_time=init+lead
. One way of doing it is
climpred.preprocessing.shared.load_hindcast()
.