Forecasts have to be verified against some product to evaluate their performance. However, when verifying against a product, there are many different ways one can compare the ensemble of forecasts. Here we cover the comparison options for both hindcast and perfect model ensembles. See terminology for clarification on the differences between these two experimental setups.
Note that all compute functions (
bootstrap_perfect_model()) take an optional
comparison='' keyword to select the comparison style. See below for a detailed
description on the differences between these comparisons.
In hindcast ensembles, the ensemble mean forecast (
comparison='e2o') is expected to
perform better than individual ensemble members (
comparison='m2o') as the chaotic
component of forecasts is expected to be suppressed by this averaging, while the memory
of the system sustains. [Boer2016]
keyword: 'e2o', 'e2r'
This is the default option.
||Compare the ensemble mean forecast to the verification data for a
keyword: 'm2o', 'm2r'
||Compares each ensemble member individually to the verification data for a
Perfect Model Ensembles¶
In perfect-model frameworks, there are many more ways of verifying forecasts.
[Seferian2018] uses a comparison of all ensemble members against the
control run (
comparison='m2c') and all ensemble members against all other ensemble
comparison='m2m'). Furthermore, the ensemble mean forecast can be verified
against one control member (
comparison='e2c') or all members (
as done in [Griffies1997].
This is the default option.
||Compare all members to ensemble mean while leaving out the reference in|
||Compare all other members forecasts to control member verification.|
||Compare all members to all others in turn while leaving out the verification
||Compare ensemble mean forecast to control member verification.|
The goal of a normalized distance metric is to get a constant or comparable value of typically 1 (or 0 for metrics defined as 1 - metric) when the metric saturates and the predictability horizon is reached (see metrics).
A factor is added in the normalized metric formula (see [Seferian2018]) to accomodate
different comparison styles. For example,
nrmse gets smalled in comparison
m2m by design, since the ensembe mean is always closer to individual members
than the ensemble members to each other. In turn, the normalization factor is
m2o. It is 1 for
Interpretation of Results¶
HindcastEnsemble skill is computed over all initializations
init of the
hindcast, the resulting skill is a mean forecast skill over all initializations.
PerfectModelEnsemble skill is computed over a supervector comprised of all
initializations and members, which allows the computation of the ACC-based skill
[Bushuk2018], but also returns a mean forecast skill over all initializations.
Compute over dimension¶
The optional argument
dim defines over which dimension a metric is computed. We can
apply a metric over
dim from [
['member', 'init']] in
compute_perfect_model() and [
compute_hindcast(). The resulting skill is then
reduced by this
dim. Therefore, applying a metric over
dim='member' creates a
skill for all initializations individually. This can show the initial conditions
dependence of skill. Likewise when computing skill over
'init', we get skill for
each member. This
dim argument is different from the
comparison argument which
just specifies how
observations are defined.
However, this above logic applies to deterministic metrics. Probabilistic metrics need
to be applied to the
member dimension and
comparison from [
'm2o' comparison in
compute_hindcast(). Using a probabilistic metric
automatically switches internally to using
You can also construct your own comparisons via the
||Master class for all comparisons.|
First, write your own comparison function, similar to the existing ones. If a
comparison should also be used for probabilistic metrics, make sure that
member dimension and
observations without. For deterministic metrics, return
observations with identical dimensions but without an identical comparison:
from climpred.comparisons import Comparison, _drop_members def _my_m2median_comparison(ds, metric=None): """Identical to m2e but median.""" observations_list =  forecast_list =  supervector_dim = 'member' for m in ds.member.values: forecast = _drop_members(ds, rmd_member=[m]).median('member') observations = ds.sel(member=m).squeeze() forecast_list.append(forecast) observations_list.append(observations) observations = xr.concat(observations_list, supervector_dim) forecast = xr.concat(forecast_list, supervector_dim) forecast[supervector_dim] = np.arange(forecast[supervector_dim].size) observations[supervector_dim] = np.arange(observations[supervector_dim].size) return forecast, observations
Then initialize this comparison function with
__my_m2median_comparison = Comparison( name='m2me', function=_my_m2median_comparison, probabilistic=False, hindcast=False)
Finally, compute skill based on your own comparison:
skill = compute_perfect_model(ds, control, metric='rmse', comparison=__my_m2median_comparison)
Once you come up with an useful comparison for your problem, consider contributing this
climpred, so all users can benefit from your comparison, see
|[Boer2016]||Boer, G. J., D. M. Smith, C. Cassou, F. Doblas-Reyes, G. Danabasoglu, B. Kirtman, Y. Kushnir, et al. “The Decadal Climate Prediction Project (DCPP) Contribution to CMIP6.” Geosci. Model Dev. 9, no. 10 (October 25, 2016): 3751–77. https://doi.org/10/f89qdf.|
|[Bushuk2018]||(1, 2) Mitchell Bushuk, Rym Msadek, Michael Winton, Gabriel Vecchi, Xiaosong Yang, Anthony Rosati, and Rich Gudgel. Regional Arctic sea–ice prediction: potential versus operational seasonal forecast skill. Climate Dynamics, June 2018. https://doi.org/10/gd7hfq.|
|[Griffies1997]||(1, 2) |
|[Seferian2018]||(1, 2) Roland Séférian, Sarah Berthet, and Matthieu Chevallier. Assessing the Decadal Predictability of Land and Ocean Carbon Uptake. Geophysical Research Letters, March 2018. https://doi.org/10/gdb424.|