*********************** Setting Up Your Dataset *********************** ``climpred`` relies on a consistent naming system for `xarray `_ dimensions. This allows things to run more easily under-the-hood. :py:class:`.PredictionEnsemble` expects at the minimum to contain dimensions ``init`` and ``lead``. ``init`` is the initialization dimension, that relays the time steps at which the ensemble was initialized. ``init`` is known as ``forecast_reference_time`` in the `CF convention `_. ``init`` must be of type :py:class:`pandas.DatetimeIndex`, or :py:class:`xarray.CFTimeIndex`. If ``init`` is of type ``int``, it is assumed to be annual data starting Jan 1st. A UserWarning is issues when this assumption is made. ``lead`` is the lead time of the forecasts from initialization. ``lead`` is known as ``forecast_period`` in the `CF convention `_. ``lead`` must be ``int`` or ``float``. The units for the ``lead`` dimension must be specified in as an attribute. Valid options are ``["years", "seasons", "months"]`` and ``["weeks", "pentads", "days", "hours", "minutes", "seconds"]``. If ``lead`` is provided as :py:class:`pandas.Timedelta` up to ``"weeks"``, ``lead`` is converted to ``int`` and a corresponding ``lead.attrs["units"]``. For larger ``lead`` as :py:class:`pandas.Timedelta` ``["months", "seasons" or "years"]``, no conversion is possible. ``valid_time=init+lead`` will be calculated in :py:class:`.PredictionEnsemble` upon instantiation. Another crucial dimension is ``member``, which holds the various ensemble members, which is only required for probabilistic metrics. ``member`` is known as ``realization`` in the `CF convention `_ Any additional dimensions will be broadcasted: these could be dimensions like ``lat``, ``lon``, ``depth``, etc. If the expected dimensions are not found, but the matching `CF convention `_ ``standard_name`` in a coordinate attribute, the dimension is renamed to the corresponding ``climpred`` ensemble dimension. Check out the demo to setup a ``climpred``-ready prediction ensemble `from your own data `_ or via `intake-esm `_ from `CMIP DCPP `_. **Verification products** are expected to contain the ``time`` dimension at the minimum. For best use of ``climpred``, their ``time`` dimension should cover the full length of ``init`` and be the same calendar type as the accompanying prediction ensemble. The ``time`` dimension must be :py:class:`pandas.DatetimeIndex`, or :py:class:`xarray.CFTimeIndex`. ``time`` dimension of type ``int`` is assumed to be annual data starting Jan 1st. A UserWarning is issued when this assumption is made. These products can also include additional dimensions, such as ``lat``, ``lon``, ``depth``, etc. See the below table for a summary of dimensions used in ``climpred``, and data types that ``climpred`` supports for them. .. list-table:: List of ``climpred`` dimension and coordinates :widths: 25 25 25 25 25 :header-rows: 1 * - Short Name - Types - Long name - `CF convention `_ - Attribute(s) * - ``lead`` - ``int``, ``float`` or :py:class:`pandas.Timedelta` up to ``weeks`` - lead timestep after initialization ``init`` - ``forecast_period`` - units (str) [``years``, ``seasons``, ``months``, ``weeks``, ``pentads``, ``days``, ``hours``, ``minutes``, ``seconds``] or :py:class:`pandas.Timedelta` * - ``init`` - :py:class:`pandas.DatetimeIndex` or :py:class:`xarray.CFTimeIndex`. - initialization as start date of experiment - ``forecast_reference_time`` - None * - ``member`` - ``int``, ``str`` - ensemble member - ``realization`` - None Probably the most challenging part is concatenating (:py:func:`xarray.concat`) raw model output with dimension ``time`` of multiple simulations to a multi-dimensional :py:class:`xarray.Dataset` containing dimensions ``init``, (``member``) and ``lead``, where ``time`` becomes ``valid_time=init+lead``. One way of doing it is :py:func:`climpred.preprocessing.shared.load_hindcast`.