Metrics

All high-level functions like verify() and bootstrap() (for both HindcastEnsemble and PerfectModelEnsemble objects) have a metric argument that has to be called to determine which metric is used in computing predictability.

Note

We use the phrase ‘observations’ o here to refer to the ‘truth’ data to which we compare the forecast f. These metrics can also be applied relative to a control simulation, reconstruction, observations, etc. This would just change the resulting score from quantifying skill to quantifying potential predictability.

Internally, all metric functions require forecast and observations as inputs. The dimension dim has to be set to specify over which dimensions the metric is applied and are hence reduced. See Comparisons for more on the dim argument.

Deterministic

Deterministic metrics assess the forecast as a definite prediction of the future, rather than in terms of probabilities. Another way to look at deterministic metrics is that they are a special case of probabilistic metrics where a value of one is assigned to one category and zero to all others [Jolliffe2011].

Correlation Metrics

The below metrics rely fundamentally on correlations in their computation. In the literature, correlation metrics are typically referred to as the Anomaly Correlation Coefficient (ACC). This implies that anomalies in the forecast and observations are being correlated. Typically, this is computed using the linear Pearson Product-Moment Correlation. However, climpred also offers the Spearman’s Rank Correlation.

Note that the p value associated with these correlations is computed via a separate metric. Use pearson_r_p_value or spearman_r_p_value to compute p values assuming that all samples in the correlated time series are independent. Use pearson_r_eff_p_value or spearman_r_eff_p_value to account for autocorrelation in the time series by calculating the effective_sample_size.

Pearson Product-Moment Correlation Coefficient

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [1]: print(f"\n\nKeywords: {metric_aliases['pearson_r']}")


Keywords: ['pearson_r', 'pr', 'acc', 'pacc']
climpred.metrics._pearson_r(forecast, verif, dim=None, **metric_kwargs)[source]

Pearson product-moment correlation coefficient.

A measure of the linear association between the forecast and verification data that is independent of the mean and variance of the individual distributions. This is also known as the Anomaly Correlation Coefficient (ACC) when correlating anomalies.

corr = \frac{cov(f, o)}{\sigma_{f}\cdot\sigma_{o}},

where \sigma_{f} and \sigma_{o} represent the standard deviation of the forecast and verification data over the experimental period, respectively.

Note

Use metric pearson_r_p_value or pearson_r_eff_p_value to get the corresponding p value.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see pearson_r()

Details:

minimum

-1.0

maximum

1.0

perfect

1.0

orientation

positive

Example

>>> HindcastEnsemble.verify(metric='pearson_r', comparison='e2o',
...     alignment='same_verifs', dim=['init'])
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.9272 0.9145 0.9127 0.9319 ... 0.9315 0.9185 0.9112

Pearson Correlation p value

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [2]: print(f"\n\nKeywords: {metric_aliases['pearson_r_p_value']}")


Keywords: ['pearson_r_p_value', 'p_pval', 'pvalue', 'pval']
climpred.metrics._pearson_r_p_value(forecast, verif, dim=None, **metric_kwargs)[source]

Probability that forecast and verification data are linearly uncorrelated.

Two-tailed p value associated with the Pearson product-moment correlation coefficient (pearson_r), assuming that all samples are independent. Use pearson_r_eff_p_value to account for autocorrelation in the forecast and verification data.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see xskillscore.pearson_r_p_value

Details:

minimum

0.0

maximum

1.0

perfect

1.0

orientation

negative

Example

>>> HindcastEnsemble.verify(metric='pearson_r_p_value', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 5.779e-23 2.753e-21 4.477e-21 ... 8.7e-22 6.781e-21

Effective Sample Size

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [3]: print(f"\n\nKeywords: {metric_aliases['effective_sample_size']}")


Keywords: ['effective_sample_size', 'n_eff', 'eff_n']
climpred.metrics._effective_sample_size(forecast, verif, dim=None, **metric_kwargs)[source]

Effective sample size for temporally correlated data.

Note

Weights are not included here due to the dependence on temporal autocorrelation.

Note

This metric can only be used for hindcast-type simulations.

The effective sample size extracts the number of independent samples between two time series being correlated. This is derived by assessing the magnitude of the lag-1 autocorrelation coefficient in each of the time series being correlated. A higher autocorrelation induces a lower effective sample size which raises the correlation coefficient for a given p value.

The effective sample size is used in computing the effective p value. See pearson_r_eff_p_value and spearman_r_eff_p_value.

N_{eff} = N\left( \frac{1 -
           \rho_{f}\rho_{o}}{1 + \rho_{f}\rho_{o}} \right),

where \rho_{f} and \rho_{o} are the lag-1 autocorrelation coefficients for the forecast and verification data.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see effective_sample_size()

Details:

minimum

0.0

maximum

perfect

N/A

orientation

positive

Reference:
  • Bretherton, Christopher S., et al. “The effective number of spatial degrees of freedom of a time-varying field.” Journal of climate 12.7 (1999): 1990-2009.

Example

>>> HindcastEnsemble.verify(metric='effective_sample_size', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 5.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0

Pearson Correlation Effective p value

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [4]: print(f"\n\nKeywords: {metric_aliases['pearson_r_eff_p_value']}")


Keywords: ['pearson_r_eff_p_value', 'p_pval_eff', 'pvalue_eff', 'pval_eff']
climpred.metrics._pearson_r_eff_p_value(forecast, verif, dim=None, **metric_kwargs)[source]

Probability that forecast and verification data are linearly uncorrelated, accounting for autocorrelation.

Note

Weights are not included here due to the dependence on temporal autocorrelation.

Note

This metric can only be used for hindcast-type simulations.

The effective p value is computed by replacing the sample size N in the t-statistic with the effective sample size, N_{eff}. The same Pearson product-moment correlation coefficient r is used as when computing the standard p value.

t = r\sqrt{ \frac{N_{eff} - 2}{1 - r^{2}} },

where N_{eff} is computed via the autocorrelation in the forecast and verification data.

N_{eff} = N\left( \frac{1 -
           \rho_{f}\rho_{o}}{1 + \rho_{f}\rho_{o}} \right),

where \rho_{f} and \rho_{o} are the lag-1 autocorrelation coefficients for the forecast and verification data.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see pearson_r_eff_p_value()

Details:

minimum

0.0

maximum

1.0

perfect

1.0

orientation

negative

Example

>>> HindcastEnsemble.verify(metric='pearson_r_eff_p_value', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.02333 0.08552 0.2679 ... 0.2369 0.2588 0.2703
Reference:
  • Bretherton, Christopher S., et al. “The effective number of spatial degrees of freedom of a time-varying field.” Journal of climate 12.7 (1999): 1990-2009.

Spearman’s Rank Correlation Coefficient

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [5]: print(f"\n\nKeywords: {metric_aliases['spearman_r']}")


Keywords: ['spearman_r', 'sacc', 'sr']
climpred.metrics._spearman_r(forecast, verif, dim=None, **metric_kwargs)[source]

Spearman’s rank correlation coefficient.

corr = \mathrm{pearsonr}(ranked(f), ranked(o))

This correlation coefficient is nonparametric and assesses how well the relationship between the forecast and verification data can be described using a monotonic function. It is computed by first ranking the forecasts and verification data, and then correlating those ranks using the pearson_r correlation.

This is also known as the anomaly correlation coefficient (ACC) when comparing anomalies, although the Pearson product-moment correlation coefficient (pearson_r) is typically used when computing the ACC.

Note

Use metric spearman_r_p_value or spearman_r_eff_p_value to get the corresponding p value.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see spearman_r()

Details:

minimum

-1.0

maximum

1.0

perfect

1.0

orientation

positive

Example

>>> HindcastEnsemble.verify(metric='spearman_r', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.9336 0.9311 0.9293 0.9474 ... 0.9465 0.9346 0.9328

Spearman’s Rank Correlation Coefficient p value

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [6]: print(f"\n\nKeywords: {metric_aliases['spearman_r_p_value']}")


Keywords: ['spearman_r_p_value', 's_pval', 'spvalue', 'spval']
climpred.metrics._spearman_r_p_value(forecast, verif, dim=None, **metric_kwargs)[source]

Probability that forecast and verification data are monotonically uncorrelated.

Two-tailed p value associated with the Spearman’s rank correlation coefficient (spearman_r), assuming that all samples are independent. Use spearman_r_eff_p_value to account for autocorrelation in the forecast and verification data.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see spearman_r_p_value()

Details:

minimum

0.0

maximum

1.0

perfect

1.0

orientation

negative

Example

>>> HindcastEnsemble.verify(metric='spearman_r_p_value', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 6.248e-24 1.515e-23 ... 4.288e-24 8.254e-24

Spearman’s Rank Correlation Effective p value

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [7]: print(f"\n\nKeywords: {metric_aliases['spearman_r_eff_p_value']}")


Keywords: ['spearman_r_eff_p_value', 's_pval_eff', 'spvalue_eff', 'spval_eff']
climpred.metrics._spearman_r_eff_p_value(forecast, verif, dim=None, **metric_kwargs)[source]

Probability that forecast and verification data are monotonically uncorrelated, accounting for autocorrelation.

Note

Weights are not included here due to the dependence on temporal autocorrelation.

Note

This metric can only be used for hindcast-type simulations.

The effective p value is computed by replacing the sample size N in the t-statistic with the effective sample size, N_{eff}. The same Spearman’s rank correlation coefficient r is used as when computing the standard p value.

t = r\sqrt{ \frac{N_{eff} - 2}{1 - r^{2}} },

where N_{eff} is computed via the autocorrelation in the forecast and verification data.

N_{eff} = N\left( \frac{1 -
           \rho_{f}\rho_{o}}{1 + \rho_{f}\rho_{o}} \right),

where \rho_{f} and \rho_{o} are the lag-1 autocorrelation coefficients for the forecast and verification data.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see spearman_r_eff_p_value()

Details:

minimum

0.0

maximum

1.0

perfect

1.0

orientation

negative

Reference:
  • Bretherton, Christopher S., et al. “The effective number of spatial degrees of freedom of a time-varying field.” Journal of climate 12.7 (1999): 1990-2009.

Example

>>> HindcastEnsemble.verify(metric='spearman_r_eff_p_value', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.02034 0.0689 0.2408 ... 0.2092 0.2315 0.2347

Distance Metrics

This class of metrics simply measures the distance (or difference) between forecasted values and observed values.

Mean Squared Error (MSE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [8]: print(f"\n\nKeywords: {metric_aliases['mse']}")


Keywords: ['mse']
climpred.metrics._mse(forecast, verif, dim=None, **metric_kwargs)[source]

Mean Sqaure Error (MSE).

MSE = \overline{(f - o)^{2}}

The average of the squared difference between forecasts and verification data. This incorporates both the variance and bias of the estimator. Because the error is squared, it is more sensitive to large forecast errors than mae, and thus a more conservative metric. For example, a single error of 2°C counts the same as two 1°C errors when using mae. On the other hand, the 2°C error counts double for mse. See Jolliffe and Stephenson, 2011.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see mse()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

See also

Reference:
  • Ian T. Jolliffe and David B. Stephenson. Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley & Sons, Ltd, Chichester, UK, December 2011. ISBN 978-1-119-96000-3 978-0-470-66071-3. URL: http://doi.wiley.com/10.1002/9781119960003.

Example

>>> HindcastEnsemble.verify(metric='mse', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.006202 0.006536 0.007771 ... 0.02417 0.02769

Root Mean Square Error (RMSE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [9]: print(f"\n\nKeywords: {metric_aliases['rmse']}")


Keywords: ['rmse']
climpred.metrics._rmse(forecast, verif, dim=None, **metric_kwargs)[source]

Root Mean Sqaure Error (RMSE).

RMSE = \sqrt{\overline{(f - o)^{2}}}

The square root of the average of the squared differences between forecasts and verification data.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see rmse()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

See also

Example

>>> HindcastEnsemble.verify(metric='rmse', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.07875 0.08085 0.08815 ... 0.1371 0.1555 0.1664

Mean Absolute Error (MAE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [10]: print(f"\n\nKeywords: {metric_aliases['mae']}")


Keywords: ['mae']
climpred.metrics._mae(forecast, verif, dim=None, **metric_kwargs)[source]

Mean Absolute Error (MAE).

MAE = \overline{|f - o|}

The average of the absolute differences between forecasts and verification data. A more robust measure of forecast accuracy than mse which is sensitive to large outlier forecast errors.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see mae()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

See also

Reference:
  • Ian T. Jolliffe and David B. Stephenson. Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley & Sons, Ltd, Chichester, UK, December 2011. ISBN 978-1-119-96000-3 978-0-470-66071-3. URL: http://doi.wiley.com/10.1002/9781119960003.

Example

>>> HindcastEnsemble.verify(metric='mae', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.06484 0.06684 0.07407 ... 0.1193 0.1361 0.1462

Median Absolute Error

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [11]: print(f"\n\nKeywords: {metric_aliases['median_absolute_error']}")


Keywords: ['median_absolute_error']
climpred.metrics._median_absolute_error(forecast, verif, dim=None, **metric_kwargs)[source]

Median Absolute Error.

median(|f - o|)

The median of the absolute differences between forecasts and verification data. Applying the median function to absolute error makes it more robust to outliers.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see median_absolute_error()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

Example

>>> HindcastEnsemble.verify(metric='median_absolute_error', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.06077 0.06556 0.06368 ... 0.1131 0.142 0.1466

Spread

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [12]: print(f"\n\nKeywords: {metric_aliases['spread']}")


Keywords: ['spread']
climpred.metrics._spread(forecast, verif, dim=None, **metric_kwargs)[source]

Ensemble spread taking the standard deviation over the member dimension.

System Message: WARNING/2 (spread = \std{f} )

latex exited with error [stdout] This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017/Debian) (preloaded format=latex) restricted \write18 enabled. entering extended mode (./math.tex LaTeX2e <2017-04-15> Babel <3.18> and hyphenation patterns for 84 language(s) loaded. (/usr/share/texlive/texmf-dist/tex/latex/base/article.cls Document Class: article 2014/09/29 v1.4h Standard LaTeX document class (/usr/share/texlive/texmf-dist/tex/latex/base/size12.clo)) (/usr/share/texlive/texmf-dist/tex/latex/base/inputenc.sty (/usr/share/texlive/texmf-dist/tex/latex/base/utf8.def (/usr/share/texlive/texmf-dist/tex/latex/base/t1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/ot1enc.dfu) (/usr/share/texlive/texmf-dist/tex/latex/base/omsenc.dfu))) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsmath.sty For additional information on amsmath, use the `?' option. (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amstext.sty (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsgen.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsbsy.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsmath/amsopn.sty)) (/usr/share/texlive/texmf-dist/tex/latex/amscls/amsthm.sty) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amssymb.sty (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/amsfonts.sty)) (/usr/share/texlive/texmf-dist/tex/latex/anyfontsize/anyfontsize.sty) (/usr/share/texlive/texmf-dist/tex/latex/tools/bm.sty) (./math.aux) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsa.fd) (/usr/share/texlive/texmf-dist/tex/latex/amsfonts/umsb.fd) ! Undefined control sequence. <argument> \split@tag \begin {split}spread = \std {f}\end {split} l.14 \begin{split}spread = \std{f}\end{split} ! Undefined control sequence. <argument> \split@tag \begin {split}spread = \std {f}\end {split} l.14 \begin{split}spread = \std{f}\end{split} [1] (./math.aux) ) (see the transcript file for additional information) Output written on math.dvi (1 page, 296 bytes). Transcript written on math.log.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data (not used).

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see std()

Details:

minimum

0.0

maximum

perfect

obs.std()

orientation

negative

Example

>>> HindcastEnsemble.verify(metric='spread', comparison='m2o', alignment='same_verifs',
...     dim=['member','init'])
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.1468 0.1738 0.1922 0.2096 ... 0.2142 0.2178 0.2098

Multiplicative bias

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [13]: print(f"\n\nKeywords: {metric_aliases['mul_bias']}")


Keywords: ['mul_bias', 'm_b', 'multiplicative_bias']
climpred.metrics._mul_bias(forecast, verif, dim=None, **metric_kwargs)[source]

Multiplicative bias.

multiplicative bias = f / o

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over

  • metric_kwargs (dict) – see xarray.mean

Details:

minimum

-∞

maximum

perfect

1.0

orientation

None

Example

>>> HindcastEnsemble.verify(metric='multiplicative_bias', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.719 0.9991 1.072 1.434 ... 1.854 2.128 2.325 2.467

Normalized Distance Metrics

Distance metrics like mse can be normalized to 1. The normalization factor depends on the comparison type choosen. For example, the distance between an ensemble member and the ensemble mean is half the distance of an ensemble member with other ensemble members. See _get_norm_factor().

Normalized Mean Square Error (NMSE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [14]: print(f"\n\nKeywords: {metric_aliases['nmse']}")


Keywords: ['nmse', 'nev']
climpred.metrics._nmse(forecast, verif, dim=None, **metric_kwargs)[source]

Normalized MSE (NMSE), also known as Normalized Ensemble Variance (NEV).

Mean Square Error (mse) normalized by the variance of the verification data.

NMSE = NEV = \frac{MSE}{\sigma^2_{o}\cdot fac}
     = \frac{\overline{(f - o)^{2}}}{\sigma^2_{o} \cdot fac},

where fac is 1 when using comparisons involving the ensemble mean (m2e, e2c, e2o) and 2 when using comparisons involving individual ensemble members (m2c, m2m, m2o). See _get_norm_factor().

Note

climpred uses a single-valued internal reference forecast for the NMSE, in the terminology of Murphy 1988. I.e., we use a single climatological variance of the verification data within the experimental window for normalizing MSE.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • comparison (str) – Name comparison needed for normalization factor fac, see _get_norm_factor() (Handled internally by the compute functions)

  • metric_kwargs (dict) – see mse()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

better than climatology

0.0 - 1.0

worse than climatology

> 1.0

Reference:
  • Griffies, S. M., and K. Bryan. “A Predictability Study of Simulated North Atlantic Multidecadal Variability.” Climate Dynamics 13, no. 7–8 (August 1, 1997): 459–87. https://doi.org/10/ch4kc4.

  • Murphy, Allan H. “Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient.” Monthly Weather Review 116, no. 12 (December 1, 1988): 2417–24. https://doi.org/10/fc7mxd.

Example

>>> HindcastEnsemble.verify(metric='nmse', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.1732 0.1825 0.217 0.2309 ... 0.5247 0.6749 0.7732

Normalized Mean Absolute Error (NMAE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [15]: print(f"\n\nKeywords: {metric_aliases['nmae']}")


Keywords: ['nmae']
climpred.metrics._nmae(forecast, verif, dim=None, **metric_kwargs)[source]

Normalized Mean Absolute Error (NMAE).

Mean Absolute Error (mae) normalized by the standard deviation of the verification data.

NMAE = \frac{MAE}{\sigma_{o} \cdot fac}
     = \frac{\overline{|f - o|}}{\sigma_{o} \cdot fac},

where fac is 1 when using comparisons involving the ensemble mean (m2e, e2c, e2o) and 2 when using comparisons involving individual ensemble members (m2c, m2m, m2o). See _get_norm_factor().

Note

climpred uses a single-valued internal reference forecast for the NMAE, in the terminology of Murphy 1988. I.e., we use a single climatological standard deviation of the verification data within the experimental window for normalizing MAE.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • comparison (str) – Name comparison needed for normalization factor fac, see _get_norm_factor() (Handled internally by the compute functions)

  • metric_kwargs (dict) – see mae()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

better than climatology

0.0 - 1.0

worse than climatology

> 1.0

Reference:
  • Griffies, S. M., and K. Bryan. “A Predictability Study of Simulated North Atlantic Multidecadal Variability.” Climate Dynamics 13, no. 7–8 (August 1, 1997): 459–87. https://doi.org/10/ch4kc4.

  • Murphy, Allan H. “Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient.” Monthly Weather Review 116, no. 12 (December 1, 1988): 2417–24. https://doi.org/10/fc7mxd.

Example

>>> HindcastEnsemble.verify(metric='nmae', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.3426 0.3532 0.3914 0.3898 ... 0.6303 0.7194 0.7726

Normalized Root Mean Square Error (NRMSE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [16]: print(f"\n\nKeywords: {metric_aliases['nrmse']}")


Keywords: ['nrmse']
climpred.metrics._nrmse(forecast, verif, dim=None, **metric_kwargs)[source]

Normalized Root Mean Square Error (NRMSE).

Root Mean Square Error (rmse) normalized by the standard deviation of the verification data.

NRMSE = \frac{RMSE}{\sigma_{o}\cdot\sqrt{fac}}
      = \sqrt{\frac{MSE}{\sigma^{2}_{o}\cdot fac}}
      = \sqrt{ \frac{\overline{(f - o)^{2}}}{ \sigma^2_{o}\cdot fac}},

where fac is 1 when using comparisons involving the ensemble mean (m2e, e2c, e2o) and 2 when using comparisons involving individual ensemble members (m2c, m2m, m2o). See _get_norm_factor().

Note

climpred uses a single-valued internal reference forecast for the NRMSE, in the terminology of Murphy 1988. I.e., we use a single climatological variance of the verification data within the experimental window for normalizing RMSE.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • comparison (str) – Name comparison needed for normalization factor fac, see _get_norm_factor() (Handled internally by the compute functions)

  • metric_kwargs (dict) – see rmse()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

better than climatology

0.0 - 1.0

worse than climatology

> 1.0

Reference:
  • Bushuk, Mitchell, Rym Msadek, Michael Winton, Gabriel Vecchi, Xiaosong Yang, Anthony Rosati, and Rich Gudgel. “Regional Arctic Sea–Ice Prediction: Potential versus Operational Seasonal Forecast Skill.” Climate Dynamics, June 9, 2018. https://doi.org/10/gd7hfq.

  • Hawkins, Ed, Steffen Tietsche, Jonathan J. Day, Nathanael Melia, Keith Haines, and Sarah Keeley. “Aspects of Designing and Evaluating Seasonal-to-Interannual Arctic Sea-Ice Prediction Systems.” Quarterly Journal of the Royal Meteorological Society 142, no. 695 (January 1, 2016): 672–83. https://doi.org/10/gfb3pn.

  • Murphy, Allan H. “Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient.” Monthly Weather Review 116, no. 12 (December 1, 1988): 2417–24. https://doi.org/10/fc7mxd.

Example

>>> HindcastEnsemble.verify(metric='nrmse', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.4161 0.4272 0.4658 0.4806 ... 0.7244 0.8215 0.8793

Mean Square Error Skill Score (MSESS)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [17]: print(f"\n\nKeywords: {metric_aliases['msess']}")


Keywords: ['msess', 'ppp', 'msss']
climpred.metrics._msess(forecast, verif, dim=None, **metric_kwargs)[source]

Mean Squared Error Skill Score (MSESS).

MSESS = 1 - \frac{MSE}{\sigma^2_{ref} \cdot fac} =
       1 - \frac{\overline{(f - o)^{2}}}{\sigma^2_{ref} \cdot fac},

where fac is 1 when using comparisons involving the ensemble mean (m2e, e2c, e2o) and 2 when using comparisons involving individual ensemble members (m2c, m2m, m2o). See _get_norm_factor().

This skill score can be intepreted as a percentage improvement in accuracy. I.e., it can be multiplied by 100%.

Note

climpred uses a single-valued internal reference forecast for the MSSS, in the terminology of Murphy 1988. I.e., we use a single climatological variance of the verification data within the experimental window for normalizing MSE.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • comparison (str) – Name comparison needed for normalization factor fac, see _get_norm_factor() (Handled internally by the compute functions)

  • metric_kwargs (dict) – see mse()

Details:

minimum

-∞

maximum

1.0

perfect

1.0

orientation

positive

better than climatology

> 0.0

equal to climatology

0.0

worse than climatology

< 0.0

Reference:
  • Griffies, S. M., and K. Bryan. “A Predictability Study of Simulated North Atlantic Multidecadal Variability.” Climate Dynamics 13, no. 7–8 (August 1, 1997): 459–87. https://doi.org/10/ch4kc4.

  • Murphy, Allan H. “Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient.” Monthly Weather Review 116, no. 12 (December 1, 1988): 2417–24. https://doi.org/10/fc7mxd.

  • Pohlmann, Holger, Michael Botzet, Mojib Latif, Andreas Roesch, Martin Wild, and Peter Tschuck. “Estimating the Decadal Predictability of a Coupled AOGCM.” Journal of Climate 17, no. 22 (November 1, 2004): 4463–72. https://doi.org/10/d2qf62.

  • Bushuk, Mitchell, Rym Msadek, Michael Winton, Gabriel Vecchi, Xiaosong Yang, Anthony Rosati, and Rich Gudgel. “Regional Arctic Sea–Ice Prediction: Potential versus Operational Seasonal Forecast Skill. Climate Dynamics, June 9, 2018. https://doi.org/10/gd7hfq.

Example

>>> HindcastEnsemble.verify(metric='msess', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.8268 0.8175 0.783 0.7691 ... 0.4753 0.3251 0.2268

Mean Absolute Percentage Error (MAPE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [18]: print(f"\n\nKeywords: {metric_aliases['mape']}")


Keywords: ['mape']
climpred.metrics._mape(forecast, verif, dim=None, **metric_kwargs)[source]

Mean Absolute Percentage Error (MAPE).

Mean absolute error (mae) expressed as the fractional error relative to the verification data.

MAPE = \frac{1}{n} \sum \frac{|f-o|}{|o|}

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see mape()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

See also

Example

>>> HindcastEnsemble.verify(metric='mape', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 1.536 1.21 1.421 1.149 ... 1.078 1.369 1.833 1.245

Symmetric Mean Absolute Percentage Error (sMAPE)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [19]: print(f"\n\nKeywords: {metric_aliases['smape']}")


Keywords: ['smape']
climpred.metrics._smape(forecast, verif, dim=None, **metric_kwargs)[source]

Symmetric Mean Absolute Percentage Error (sMAPE).

Similar to the Mean Absolute Percentage Error (mape), but sums the forecast and observation mean in the denominator.

sMAPE = \frac{1}{n} \sum \frac{|f-o|}{|f|+|o|}

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see smape()

Details:

minimum

0.0

maximum

1.0

perfect

0.0

orientation

negative

See also

Example

>>> HindcastEnsemble.verify(metric='smape', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.3801 0.3906 0.4044 0.3819 ... 0.4822 0.5054 0.5295

Unbiased Anomaly Correlation Coefficient (uACC)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [20]: print(f"\n\nKeywords: {metric_aliases['uacc']}")


Keywords: ['uacc']
climpred.metrics._uacc(forecast, verif, dim=None, **metric_kwargs)[source]

Bushuk’s unbiased Anomaly Correlation Coefficient (uACC).

This is typically used in perfect model studies. Because the perfect model Anomaly Correlation Coefficient (ACC) is strongly state dependent, a standard ACC (e.g. one computed using pearson_r) will be highly sensitive to the set of start dates chosen for the perfect model study. The Mean Square Skill Score (MESSS) can be related directly to the ACC as MESSS = ACC^(2) (see Murphy 1988 and Bushuk et al. 2019), so the unbiased ACC can be derived as uACC = sqrt(MESSS).

uACC = \sqrt{MSESS}
     = \sqrt{1 - \frac{\overline{(f - o)^{2}}}{\sigma^2_{ref} \cdot fac}},

where fac is 1 when using comparisons involving the ensemble mean (m2e, e2c, e2o) and 2 when using comparisons involving individual ensemble members (m2c, m2m, m2o). See _get_norm_factor().

Note

Because of the square root involved, any negative MSESS values are automatically converted to NaNs.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • comparison (str) – Name comparison needed for normalization factor fac, see _get_norm_factor() (Handled internally by the compute functions)

  • metric_kwargs (dict) – see mse()

Details:

minimum

0.0

maximum

1.0

perfect

1.0

orientation

positive

better than climatology

> 0.0

equal to climatology

0.0

Reference:
  • Bushuk, Mitchell, Rym Msadek, Michael Winton, Gabriel Vecchi, Xiaosong Yang, Anthony Rosati, and Rich Gudgel. “Regional Arctic Sea–Ice Prediction: Potential versus Operational Seasonal Forecast Skill.” Climate Dynamics, June 9, 2018. https://doi.org/10/gd7hfq.

  • Allan H. Murphy. Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient. Monthly Weather Review, 116(12):2417–2424, December 1988. https://doi.org/10/fc7mxd.

Example

>>> HindcastEnsemble.verify(metric='uacc', comparison='e2o', alignment='same_verifs',
...     dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.9093 0.9041 0.8849 0.877 ... 0.6894 0.5702 0.4763

Murphy Decomposition Metrics

Metrics derived in [Murphy1988] which decompose the MSESS into a correlation term, a conditional bias term, and an unconditional bias term. See https://www-miklip.dkrz.de/about/murcss/ for a walk through of the decomposition.

Standard Ratio

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [21]: print(f"\n\nKeywords: {metric_aliases['std_ratio']}")


Keywords: ['std_ratio']
climpred.metrics._std_ratio(forecast, verif, dim=None, **metric_kwargs)[source]

Ratio of standard deviations of the forecast over the verification data.

\text{std ratio} = \frac{\sigma_f}{\sigma_o},

where \sigma_{f} and \sigma_{o} are the standard deviations of the forecast and the verification data over the experimental period, respectively.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see xarray.std

Details:

minimum

0.0

maximum

perfect

1.0

orientation

N/A

Reference:

Example

>>> HindcastEnsemble.verify(metric='std_ratio', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.7567 0.8801 0.9726 1.055 ... 1.075 1.094 1.055

Conditional Bias

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [22]: print(f"\n\nKeywords: {metric_aliases['conditional_bias']}")


Keywords: ['conditional_bias', 'c_b', 'cond_bias']
climpred.metrics._conditional_bias(forecast, verif, dim=None, **metric_kwargs)[source]

Conditional bias between forecast and verification data.

\text{conditional bias} = r_{fo} - \frac{\sigma_f}{\sigma_o},

where \sigma_{f} and \sigma_{o} are the standard deviations of the forecast and verification data over the experimental period, respectively.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see pearson_r()

:param and Datasetstd():

Details:

minimum

-∞

maximum

1.0

perfect

0.0

orientation

negative

Reference:

Example

>>> HindcastEnsemble.verify(metric='conditional_bias', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.1705 0.03435 -0.05988 ... -0.1436 -0.175 -0.1434

Unconditional Bias

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [23]: print(f"\n\nKeywords: {metric_aliases['unconditional_bias']}")


Keywords: ['unconditional_bias', 'u_b', 'a_b', 'bias', 'additive_bias']

Simple bias of the forecast minus the observations.

climpred.metrics._unconditional_bias(forecast, verif, dim=None, **metric_kwargs)[source]

Unconditional additive bias.

bias = f - o

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over

  • metric_kwargs (dict) – see xarray.mean

Details:

minimum

-∞

maximum

perfect

0.0

orientation

negative

Reference:

Example

>>> HindcastEnsemble.verify(metric='unconditional_bias', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 -0.01158 -0.02512 -0.0408 ... -0.1322 -0.1445

Conditional bias is removed by remove_bias().

>>> HindcastEnsemble = HindcastEnsemble.remove_bias(alignment='same_verifs')
>>> HindcastEnsemble.verify(metric='unconditional_bias', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 3.203e-18 -1.068e-18 ... 2.882e-17 -2.776e-17

Bias Slope

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [24]: print(f"\n\nKeywords: {metric_aliases['bias_slope']}")


Keywords: ['bias_slope']
climpred.metrics._bias_slope(forecast, verif, dim=None, **metric_kwargs)[source]

Bias slope between verification data and forecast standard deviations.

\text{bias slope} = \frac{s_{o}}{s_{f}} \cdot r_{fo},

where r_{fo} is the Pearson product-moment correlation between the forecast and the verification data and s_{o} and s_{f} are the standard deviations of the verification data and forecast over the experimental period, respectively.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see pearson_r() and

:param std():

Details:

minimum

0.0

maximum

perfect

1.0

orientation

negative

Reference:

Example

>>> HindcastEnsemble.verify(metric='bias_slope', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.7016 0.8049 0.8877 0.9836 ... 1.002 1.004 0.961

Murphy’s Mean Square Error Skill Score

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [25]: print(f"\n\nKeywords: {metric_aliases['msess_murphy']}")


Keywords: ['msess_murphy', 'msss_murphy']
climpred.metrics._msess_murphy(forecast, verif, dim=None, **metric_kwargs)[source]

Murphy’s Mean Square Error Skill Score (MSESS).

MSESS_{Murphy} = r_{fo}^2 - [\text{conditional bias}]^2 -         [\frac{\text{(unconditional) bias}}{\sigma_o}]^2,

where r_{fo}^{2} represents the Pearson product-moment correlation coefficient between the forecast and verification data and \sigma_{o} represents the standard deviation of the verification data over the experimental period. See conditional_bias and unconditional_bias for their respective formulations.

Parameters
  • forecast (xarray object) – Forecast.

  • verif (xarray object) – Verification data.

  • dim (str) – Dimension(s) to perform metric over.

  • metric_kwargs (dict) – see pearson_r(),

:param mean() and std():

Details:

minimum

-∞

maximum

1.0

perfect

1.0

orientation

positive

Reference:

Example

>>> HindcastEnsemble = HindcastEnsemble.remove_bias(alignment='same_verifs')
>>> HindcastEnsemble.verify(metric='msess_murphy', comparison='e2o',
...     alignment='same_verifs', dim='init')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.8306 0.8351 0.8295 0.8532 ... 0.8471 0.813 0.8097

Probabilistic

Probabilistic metrics include the spread of the ensemble simulations in their calculations and assign a probability value between 0 and 1 to their forecasts [Jolliffe2011].

Continuous Ranked Probability Score (CRPS)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [26]: print(f"\n\nKeywords: {metric_aliases['crps']}")


Keywords: ['crps']
climpred.metrics._crps(forecast, verif, dim=None, **metric_kwargs)[source]

Continuous Ranked Probability Score (CRPS).

The CRPS can also be considered as the probabilistic Mean Absolute Error (mae). It compares the empirical distribution of an ensemble forecast to a scalar observation. Smaller scores indicate better skill.

CRPS = \int_{-\infty}^{\infty} (F(f) - H(f - o))^{2} df,

where F(f) is the cumulative distribution function (CDF) of the forecast (since the verification data are not assigned a probability), and H() is the Heaviside step function where the value is 1 if the argument is positive (i.e., the forecast overestimates verification data) or zero (i.e., the forecast equals verification data) and is 0 otherwise (i.e., the forecast is less than verification data).

Note

The CRPS is expressed in the same unit as the observed variable. It generalizes the Mean Absolute Error (MAE), and reduces to the MAE if the forecast is determinstic.

Parameters
  • forecast (xr.object) – Forecast with member dim.

  • verif (xr.object) – Verification data without member dim.

  • dim (list of str) – Dimension to apply metric over. Expects at least member. Other dimensions are passed to xskillscore and averaged.

  • metric_kwargs (dict) – optional, see crps_ensemble()

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

Reference:

See also

Example

>>> HindcastEnsemble.verify(metric='crps', comparison='m2o', dim='member',
...     alignment='same_inits')
<xarray.Dataset>
Dimensions:     (lead: 10, init: 52)
Coordinates:
  * init        (init) object 1954-01-01 00:00:00 ... 2005-01-01 00:00:00
  * lead        (lead) int32 1 2 3 4 5 6 7 8 9 10
    valid_time  (lead, init) object 1955-01-01 00:00:00 ... 2015-01-01 00:00:00
    skill       <U11 'initialized'
Data variables:
    SST         (lead, init) float64 0.1722 0.1202 0.01764 ... 0.05428 0.1638

Continuous Ranked Probability Skill Score (CRPSS)

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [27]: print(f"\n\nKeywords: {metric_aliases['crpss']}")


Keywords: ['crpss']
climpred.metrics._crpss(forecast, verif, dim=None, **metric_kwargs)[source]

Continuous Ranked Probability Skill Score.

This can be used to assess whether the ensemble spread is a useful measure for the forecast uncertainty by comparing the CRPS of the ensemble forecast to that of a reference forecast with the desired spread.

CRPSS = 1 - \frac{CRPS_{initialized}}{CRPS_{clim}}

Note

When assuming a Gaussian distribution of forecasts, use default gaussian=True. If not gaussian, you may specify the distribution type, xmin/xmax/tolerance for integration (see crps_quadrature()).

Parameters
  • forecast (xr.object) – Forecast with member dim.

  • verif (xr.object) – Verification data without member dim.

  • dim (list of str) – Dimension to apply metric over. Expects at least member. Other dimensions are passed to xskillscore and averaged.

  • metric_kwargs (dict) –

    optional gaussian (bool, optional): If True, assume Gaussian distribution for

    baseline skill. Defaults to True.

    see crps_ensemble(), crps_gaussian() and crps_quadrature()

Details:

minimum

-∞

maximum

1.0

perfect

1.0

orientation

positive

better than climatology

> 0.0

worse than climatology

< 0.0

Reference:
  • Matheson, James E., and Robert L. Winkler. “Scoring Rules for Continuous Probability Distributions.” Management Science 22, no. 10 (June 1, 1976): 1087–96. https://doi.org/10/cwwt4g.

  • Gneiting, Tilmann, and Adrian E Raftery. “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association 102, no. 477 (March 1, 2007): 359–78. https://doi.org/10/c6758w.

Example

>>> HindcastEnsemble.verify(metric='crpss', comparison='m2o',
...     alignment='same_inits', dim='member')
<xarray.Dataset>
Dimensions:     (init: 52, lead: 10)
Coordinates:
  * init        (init) object 1954-01-01 00:00:00 ... 2005-01-01 00:00:00
  * lead        (lead) int32 1 2 3 4 5 6 7 8 9 10
    valid_time  (lead, init) object 1955-01-01 00:00:00 ... 2015-01-01 00:00:00
    skill       <U11 'initialized'
Data variables:
    SST         (lead, init) float64 0.2644 0.3636 0.7376 ... 0.7702 0.5126
>>> import scipy
>>> PerfectModelEnsemble..isel(lead=[0, 1]).verify(metric='crpss', comparison='m2m',
...     dim='member', gaussian=False, cdf_or_dist=scipy.stats.norm, xmin=-10,
...     xmax=10, tol=1e-6)  
<xarray.Dataset>
Dimensions:  (init: 12, lead: 2, member: 9)
Coordinates:
  * init     (init) object 3014-01-01 00:00:00 ... 3257-01-01 00:00:00
  * lead     (lead) int64 1 2
  * member   (member) int64 1 2 3 4 5 6 7 8 9
Data variables:
    tos      (lead, init, member) float64 0.9931 0.9932 0.9932 ... 0.9947 0.9947

See also

Continuous Ranked Probability Skill Score Ensemble Spread

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [28]: print(f"\n\nKeywords: {metric_aliases['crpss_es']}")


Keywords: ['crpss_es']
climpred.metrics._crpss_es(forecast, verif, dim=None, **metric_kwargs)[source]

Continuous Ranked Probability Skill Score Ensemble Spread.

If the ensemble variance is smaller than the observed mse, the ensemble is said to be under-dispersive (or overconfident). An ensemble with variance larger than the verification data indicates one that is over-dispersive (underconfident).

CRPSS = 1 - \frac{CRPS(\sigma^2_f)}{CRPS(\sigma^2_o)}

Parameters
  • forecast (xr.object) – Forecast with member dim.

  • verif (xr.object) – Verification data without member dim.

  • dim (list of str) – Dimension to apply metric over. Expects at least member. Other dimensions are passed to xskillscore and averaged.

  • metric_kwargs (dict) – see crps_ensemble()

:param and mse():

Details:

minimum

-∞

maximum

0.0

perfect

0.0

orientation

positive

under-dispersive

> 0.0

over-dispersive

< 0.0

Reference:
  • Kadow, Christopher, Sebastian Illing, Oliver Kunst, Henning W. Rust, Holger Pohlmann, Wolfgang A. Müller, and Ulrich Cubasch. “Evaluation of Forecasts by Accuracy and Spread in the MiKlip Decadal Climate Prediction System.” Meteorologische Zeitschrift, December 21, 2016, 631–43. https://doi.org/10/f9jrhw.

Example

>>> HindcastEnsemble.verify(metric='crpss_es', comparison='m2o',
...     alignment='same_verifs', dim='member')
<xarray.Dataset>
Dimensions:     (init: 52, lead: 10)
Coordinates:
  * init        (init) object 1964-01-01 00:00:00 ... 2015-01-01 00:00:00
  * lead        (lead) int32 1 2 3 4 5 6 7 8 9 10
    valid_time  (init) object 1964-01-01 00:00:00 ... 2015-01-01 00:00:00
    skill       <U11 'initialized'
Data variables:
    SST         (lead, init) float64 -0.01121 -0.05575 ... -0.1263 -0.007483

Brier Score

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [29]: print(f"\n\nKeywords: {metric_aliases['brier_score']}")


Keywords: ['brier_score', 'brier', 'bs']
climpred.metrics._brier_score(forecast, verif, dim=None, **metric_kwargs)[source]

Brier Score for binary events.

The Mean Square Error (mse) of probabilistic two-category forecasts where the verification data are either 0 (no occurrence) or 1 (occurrence) and forecast probability may be arbitrarily distributed between occurrence and non-occurrence. The Brier Score equals zero for perfect (single-valued) forecasts and one for forecasts that are always incorrect.

BS(f, o) = (f_1 - o)^2,

where f_1 is the forecast probability of o=1.

Note

The Brier Score requires that the observation is binary, i.e., can be described as one (a “hit”) or zero (a “miss”). So either provide a function with with binary outcomes logical in metric_kwargs or create binary verifs and probability forecasts by hindcast.map(logical).mean(‘member’). This Brier Score is not the original formula given in Brier’s 1950 paper.

Parameters
  • forecast (xr.object) – Raw forecasts with member dimension if logical provided in metric_kwargs. Probability forecasts in [0,1] if logical is not provided.

  • verif (xr.object) – Verification data without member dim. Raw verification if logical provided, else binary verification.

  • dim (list or str) – Dimensions to aggregate. Requires member if logical provided in metric_kwargs to create probability forecasts. If logical not provided in metric_kwargs, should not include member.

  • metric_kwargs (dict) –

    optional logical (callable): Function with bool result to be applied to verification

    data and forecasts and then mean('member') to get forecasts and verification data in interval [0,1].

    see brier_score()

Details:

minimum

0.0

maximum

1.0

perfect

0.0

orientation

negative

Reference:

See also

Example

Define a boolean/logical function for binary scoring:

>>> def pos(x): return x > 0  # checking binary outcomes

Option 1. Pass with keyword logical: (specifically designed for PerfectModelEnsemble, where binary verification can only be created after comparison)

>>> HindcastEnsemble.verify(metric='brier_score', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs', logical=pos)
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.115 0.1121 0.1363 0.125 ... 0.1654 0.1675 0.1873

Option 2. Pre-process to generate a binary multi-member forecast and binary verification product:

>>> HindcastEnsemble.map(pos).verify(metric='brier_score',
...     comparison='m2o', dim=['member', 'init'], alignment='same_verifs')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.115 0.1121 0.1363 0.125 ... 0.1654 0.1675 0.1873

Option 3. Pre-process to generate a probability forecast and binary verification product. because member not present in hindcast anymore, use comparison='e2o' and dim='init':

>>> HindcastEnsemble.map(pos).mean('member').verify(metric='brier_score',
...     comparison='e2o', dim='init', alignment='same_verifs')
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'
Data variables:
    SST      (lead) float64 0.115 0.1121 0.1363 0.125 ... 0.1654 0.1675 0.1873

Threshold Brier Score

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [30]: print(f"\n\nKeywords: {metric_aliases['threshold_brier_score']}")


Keywords: ['threshold_brier_score', 'tbs']
climpred.metrics._threshold_brier_score(forecast, verif, dim=None, **metric_kwargs)[source]

Brier score of an ensemble for exceeding given thresholds.

CRPS = \int_f BS(F(f), H(f - o)) df

where F(o) = \int_{f \leq o} p(f) df is the cumulative distribution function (CDF) of the forecast distribution F, o is a point estimate of the true observation (observational error is neglected), BS denotes the Brier score and H(x) denotes the Heaviside step function, which we define here as equal to 1 for x \geq 0 and 0 otherwise.

Parameters
  • forecast (xr.object) – Forecast with member dim.

  • verif (xr.object) – Verification data without member dim.

  • dim (list of str) – Dimension to apply metric over. Expects at least member. Other dimensions are passed to xskillscore and averaged.

  • threshold (int, float, xr.object) – Threshold to check exceedance, see properscoring.threshold_brier_score.

  • metric_kwargs (dict) – optional, see threshold_brier_score()

Details:

minimum

0.0

maximum

1.0

perfect

0.0

orientation

negative

Reference:

See also

Example

>>> # get threshold brier score for each init
>>> HindcastEnsemble.verify(metric='threshold_brier_score', comparison='m2o',
...     dim='member', threshold=.2, alignment='same_inits')
<xarray.Dataset>
Dimensions:     (lead: 10, init: 52)
Coordinates:
  * init        (init) object 1954-01-01 00:00:00 ... 2005-01-01 00:00:00
  * lead        (lead) int32 1 2 3 4 5 6 7 8 9 10
    valid_time  (lead, init) object 1955-01-01 00:00:00 ... 2015-01-01 00:00:00
    threshold   float64 0.2
    skill       <U11 'initialized'
Data variables:
    SST         (lead, init) float64 0.0 0.0 0.0 0.0 0.0 ... 0.25 0.36 0.09 0.01
>>> # multiple thresholds averaging over init dimension
>>> HindcastEnsemble.verify(metric='threshold_brier_score', comparison='m2o',
...     dim=['member', 'init'], threshold=[.2, .3], alignment='same_verifs')
<xarray.Dataset>
Dimensions:    (lead: 10, threshold: 2)
Coordinates:
  * lead       (lead) int32 1 2 3 4 5 6 7 8 9 10
  * threshold  (threshold) float64 0.2 0.3
    skill      <U11 'initialized'
Data variables:
    SST        (lead, threshold) float64 0.08712 0.005769 ... 0.1312 0.01923

Ranked Probability Score

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [31]: print(f"\n\nKeywords: {metric_aliases['rps']}")


Keywords: ['rps']
climpred.metrics._rps(forecast, verif, dim=None, **metric_kwargs)[source]

Ranked Probability Score.

RPS(p, k) = \sum_{m=1}^{M} [(\sum_{k=1}^{m} p_k) - (\sum_{k=1}^{m}             o_k)]^{2}

Parameters
  • forecast (xr.object) – Forecasts.

  • verif (xr.object) – Verification.

  • dim (list or str) – Dimensions to aggregate.

  • xs.rps (**metric_kwargs, see) –

Note

If category_edges is xr.Dataset or tuple of xr.Datasets, climpred will broadcast the grouped dimensions season, month, weekofyear, dayfofyear onto the dimensions init for forecast and time for observations. see climpred.utils.broadcast_time_grouped_to_time.

Details:

minimum

0.0

maximum

perfect

0.0

orientation

negative

See also

Example

>>> category_edges = np.array([-.5, 0., .5, 1.])
>>> HindcastEnsemble.verify(metric='rps', comparison='m2o', dim=['member', 'init'],
...     alignment='same_verifs', category_edges=category_edges)
<xarray.Dataset>
Dimensions:                     (lead: 10)
Coordinates:
  * lead                        (lead) int32 1 2 3 4 5 6 7 8 9 10
    observations_category_edge  <U67 '[-np.inf, -0.5), [-0.5, 0.0), [0.0, 0.5...
    forecasts_category_edge     <U67 '[-np.inf, -0.5), [-0.5, 0.0), [0.0, 0.5...
    skill                       <U11 'initialized'
Data variables:
    SST                         (lead) float64 0.115 0.1123 ... 0.1687 0.1875

Provide category_edges as xr.Dataset for category_edges varying along dimensions.

>>> category_edges = xr.DataArray([9.5, 10., 10.5, 11.], dims='category_edge').assign_coords(category_edge=[9.5, 10., 10.5, 11.]).to_dataset(name='tos')
>>> # category_edges = np.array([9.5, 10., 10.5, 11.]) # identical
>>> PerfectModelEnsemble.verify(metric='rps', comparison='m2c',
...     dim=['member','init'], category_edges=category_edges)
<xarray.Dataset>
Dimensions:                     (lead: 20)
Coordinates:
  * lead                        (lead) int64 1 2 3 4 5 6 7 ... 15 16 17 18 19 20
    observations_category_edge  <U71 '[-np.inf, 9.5), [9.5, 10.0), [10.0, 10....
    forecasts_category_edge     <U71 '[-np.inf, 9.5), [9.5, 10.0), [10.0, 10....
Data variables:
    tos                         (lead) float64 0.08951 0.1615 ... 0.1399 0.2274

Provide category_edges as tuple for different category_edges to categorize forecasts and observations.

>>> q = [1 / 3, 2 / 3]
>>> forecast_edges = HindcastEnsemble.get_initialized().groupby('init.month').quantile(q=q, dim=['init','member']).rename({'quantile':'category_edge'})
>>> obs_edges = HindcastEnsemble.get_observations().groupby('time.month').quantile(q=q, dim='time').rename({'quantile':'category_edge'})
>>> category_edges = (obs_edges, forecast_edges)
>>> HindcastEnsemble.verify(metric='rps', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs',
...     category_edges=category_edges)
<xarray.Dataset>
Dimensions:                     (lead: 10)
Coordinates:
  * lead                        (lead) int32 1 2 3 4 5 6 7 8 9 10
    observations_category_edge  <U101 '[-np.inf, 0.3333333333333333), [0.3333...
    forecasts_category_edge     <U101 '[-np.inf, 0.3333333333333333), [0.3333...
    skill                       <U11 'initialized'
Data variables:
    SST                         (lead) float64 0.1248 0.1756 ... 0.3081 0.3413

Reliability

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [32]: print(f"\n\nKeywords: {metric_aliases['reliability']}")


Keywords: ['reliability']
climpred.metrics._reliability(forecast, verif, dim=None, **metric_kwargs)[source]

Returns the data required to construct the reliability diagram for an event. The the relative frequencies of occurrence of an event for a range of forecast probability bins.

Parameters
  • forecast (xr.object) – Raw forecasts with member dimension if logical provided in metric_kwargs. Probability forecasts in [0,1] if logical is not provided.

  • verif (xr.object) – Verification data without member dim. Raw verification if logical provided, else binary verification.

  • dim (list or str) – Dimensions to aggregate. Requires member if logical provided in metric_kwargs to create probability forecasts. If logical not provided in metric_kwargs, should not include member.

  • logical (callable, optional) – Function with bool result to be applied to verification data and forecasts and then mean('member') to get forecasts and verification data in interval [0,1]. Passed via metric_kwargs.

  • probability_bin_edges (array_like, optional) – Probability bin edges used to compute the reliability. Bins include the left most edge, but not the right. Passed via metric_kwargs. Defaults to 6 equally spaced edges between 0 and 1+1e-8.

Returns

The relative frequency of occurrence for each

probability bin

Return type

reliability (xr.object)

Details:

perfect

flat distribution

See also

Example

Define a boolean/logical function for binary scoring:

>>> def pos(x): return x > 0  # checking binary outcomes

Option 1. Pass with keyword logical: (especially designed for PerfectModelEnsemble, where binary verification can only be created after comparison))

>>> HindcastEnsemble.verify(metric='reliability', comparison='m2o',
...     dim=['member','init'], alignment='same_verifs', logical=pos)
<xarray.Dataset>
Dimensions:               (lead: 10, forecast_probability: 5)
Coordinates:
  * lead                  (lead) int32 1 2 3 4 5 6 7 8 9 10
  * forecast_probability  (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9
    SST_samples           (lead, forecast_probability) float64 22.0 5.0 ... 13.0
    skill                 <U11 'initialized'
Data variables:
    SST                   (lead, forecast_probability) float64 0.09091 ... 1.0

Option 2. Pre-process to generate a binary forecast and verification product:

>>> HindcastEnsemble.map(pos).verify(metric='reliability',
...     comparison='m2o', dim=['init', 'member'], alignment='same_verifs')
<xarray.Dataset>
Dimensions:               (lead: 10, forecast_probability: 5)
Coordinates:
  * lead                  (lead) int32 1 2 3 4 5 6 7 8 9 10
  * forecast_probability  (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9
    SST_samples           (lead, forecast_probability) float64 22.0 5.0 ... 13.0
    skill                 <U11 'initialized'
Data variables:
    SST                   (lead, forecast_probability) float64 0.09091 ... 1.0

Option 3. Pre-process to generate a probability forecast and binary verification product. because member not present in hindcast, use comparison='e2o' and dim='init':

>>> HindcastEnsemble.map(pos).mean('member').verify(metric='reliability',
...     comparison='e2o', dim='init', alignment='same_verifs')
<xarray.Dataset>
Dimensions:               (lead: 10, forecast_probability: 5)
Coordinates:
  * lead                  (lead) int32 1 2 3 4 5 6 7 8 9 10
  * forecast_probability  (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9
    SST_samples           (lead, forecast_probability) float64 22.0 5.0 ... 13.0
    skill                 <U11 'initialized'
Data variables:
    SST                   (lead, forecast_probability) float64 0.09091 ... 1.0

Discrimination

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [33]: print(f"\n\nKeywords: {metric_aliases['discrimination']}")


Keywords: ['discrimination']
climpred.metrics._discrimination(forecast, verif, dim=None, **metric_kwargs)[source]

Returns the data required to construct the discrimination diagram for an event. The histogram of forecasts likelihood when observations indicate an event has occurred and has not occurred.

Parameters
  • forecast (xr.object) – Raw forecasts with member dimension if logical provided in metric_kwargs. Probability forecasts in [0,1] if logical is not provided.

  • verif (xr.object) – Verification data without member dim. Raw verification if logical provided, else binary verification.

  • dim (list or str) – Dimensions to aggregate. Requires member if logical provided in metric_kwargs to create probability forecasts. If logical not provided in metric_kwargs, should not include member. At least one dimension other than member is required.

  • logical (callable, optional) – Function with bool result to be applied to verification data and forecasts and then mean('member') to get forecasts and verification data in interval [0,1]. Passed via metric_kwargs.

  • probability_bin_edges (array_like, optional) – Probability bin edges used to compute the histograms. Bins include the left most edge, but not the right. Passed via metric_kwargs. Defaults to 6 equally spaced edges between 0 and 1+1e-8.

Returns

Discrimination (xr.object) with added dimension “event” containing the histograms of forecast probabilities when the event was observed and not observed

Details:

perfect

distinct distributions

See also

Example

Define a boolean/logical function for binary scoring:

>>> def pos(x): return x > 0  # checking binary outcomes

Option 1. Pass with keyword logical: (especially designed for PerfectModelEnsemble, where binary verification can only be created after comparison)

>>> HindcastEnsemble.verify(metric='discrimination', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs', logical=pos)
<xarray.Dataset>
Dimensions:               (lead: 10, forecast_probability: 5, event: 2)
Coordinates:
  * lead                  (lead) int32 1 2 3 4 5 6 7 8 9 10
  * forecast_probability  (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9
  * event                 (event) bool True False
    skill                 <U11 'initialized'
Data variables:
    SST                   (lead, event, forecast_probability) float64 0.07407...

Option 2. Pre-process to generate a binary forecast and verification product:

>>> HindcastEnsemble.map(pos).verify(metric='discrimination',
...     comparison='m2o', dim=['member','init'], alignment='same_verifs')
<xarray.Dataset>
Dimensions:               (lead: 10, forecast_probability: 5, event: 2)
Coordinates:
  * lead                  (lead) int32 1 2 3 4 5 6 7 8 9 10
  * forecast_probability  (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9
  * event                 (event) bool True False
    skill                 <U11 'initialized'
Data variables:
    SST                   (lead, event, forecast_probability) float64 0.07407...

Option 3. Pre-process to generate a probability forecast and binary verification product. because member not present in hindcast, use comparison='e2o' and dim='init':

>>> HindcastEnsemble.map(pos).mean('member').verify(metric='discrimination',
...     comparison='e2o', dim='init', alignment='same_verifs')
<xarray.Dataset>
Dimensions:               (lead: 10, forecast_probability: 5, event: 2)
Coordinates:
  * lead                  (lead) int32 1 2 3 4 5 6 7 8 9 10
  * forecast_probability  (forecast_probability) float64 0.1 0.3 0.5 0.7 0.9
  * event                 (event) bool True False
    skill                 <U11 'initialized'
Data variables:
    SST                   (lead, event, forecast_probability) float64 0.07407...

Rank Histogram

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [34]: print(f"\n\nKeywords: {metric_aliases['rank_histogram']}")


Keywords: ['rank_histogram']
climpred.metrics._rank_histogram(forecast, verif, dim=None, **metric_kwargs)[source]

Rank histogram or Talagrand diagram.

Parameters
  • forecast (xr.object) – Raw forecasts with member dimension.

  • verif (xr.object) – Verification data without member dim.

  • dim (list or str) – Dimensions to aggregate. Requires to contain member and at least one additional dimension.

Details:

flat

perfect

slope

biased

u-shaped

overconfident/underdisperive

dome-shaped

underconfident/overdisperive

See also

Example

>>> HindcastEnsemble.verify(metric='rank_histogram', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs')
<xarray.Dataset>
Dimensions:  (lead: 10, rank: 11)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
  * rank     (rank) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0
    skill    <U11 'initialized'
Data variables:
    SST      (lead, rank) int64 12 3 2 1 1 3 1 2 6 5 16 ... 0 1 0 0 3 0 2 6 6 34
>>> PerfectModelEnsemble.verify(metric='rank_histogram', comparison='m2c',
...     dim=['member', 'init'])
<xarray.Dataset>
Dimensions:  (lead: 20, rank: 10)
Coordinates:
  * lead     (lead) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  * rank     (rank) float64 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
Data variables:
    tos      (lead, rank) int64 1 4 2 1 2 1 0 0 0 1 2 ... 0 2 0 1 2 1 0 3 1 2 0

Logarithmic Ensemble Spread Score

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [35]: print(f"\n\nKeywords: {metric_aliases['less']}")


Keywords: ['less']
climpred.metrics._less(forecast, verif, dim=None, **metric_kwargs)[source]

Logarithmic Ensemble Spread Score.

LESS = ln(\frac{variance}{MSE})= ln(\frac{\sigma^2_f}{\sigma^2_o})

Parameters
  • forecast (xr.object) – Forecasts.

  • verif (xr.object) – Verification.

  • dim (str, list of str) – The dimension(s) over which to aggregate. Defaults to None, meaning aggregation over all dims other than lead.

Returns

reduced by dimensions dim

Return type

less (xr.object)

Details:

maximum

positive

overdisperive / underconfident

perfect

0

negative

underdisperive / overconfident

minimum

-∞

orientation

None

Example

>>> # better detrend before
>>> from climpred.stats import rm_poly
>>> HindcastEnsemble.map(rm_poly, dim="init_or_time", deg=2).verify(
...     metric='less', comparison='m2o', dim=['member', 'init'],
...     alignment='same_verifs').SST
<xarray.DataArray 'SST' (lead: 10)>
array([ 0.12633664, -0.12707636, -0.26143181, -0.25096537, -0.29267366,
       -0.2905725 , -0.43579508, -0.33774947, -0.46008438, -0.61010386])
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'

References

  • Kadow, Christopher, Sebastian Illing, Oliver Kunst, Henning W. Rust, Holger Pohlmann, Wolfgang A. Müller, and Ulrich Cubasch. “Evaluation of Forecasts by Accuracy and Spread in the MiKlip Decadal Climate Prediction System.” Meteorologische Zeitschrift, December 21, 2016, 631–43. https://doi.org/10/f9jrhw.

Contingency-based metrics

A number of metrics can be derived from a contingency table. To use this in climpred, run .verify(metric='contingency', score=...) where score can be chosen from xskillscore.

climpred.metrics._contingency(forecast, verif, score='table', dim=None, **metric_kwargs)[source]

Contingency table.

Parameters
  • forecast (xr.object) – Raw forecasts.

  • verif (xr.object) – Verification data.

  • dim (list or str) – Dimensions to aggregate.

  • score (str) – Score derived from contingency table. Attribute from Contingency. Use score=table to return a contingency table or any other contingency score, e.g. score=hit_rate.

  • observation_category_edges (array_like) – Category bin edges used to compute the observations CDFs. Bins include the left most edge, but not the right. Passed via metric_kwargs.

  • forecast_category_edges (array_like) – Category bin edges used to compute the forecast CDFs. Bins include the left most edge, but not the right. Passed via metric_kwargs

Example

>>> category_edges = np.array([-0.5, 0.0, 0.5, 1.0])
>>> HindcastEnsemble.verify(metric='contingency', score='table', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs',
...     observation_category_edges=category_edges,
...     forecast_category_edges=category_edges).isel(lead=[0, 1]).SST
<xarray.DataArray 'SST' (lead: 2, observations_category: 3, forecasts_category: 3)>
array([[[221,  29,   0],
        [ 53, 217,   0],
        [  0,   0,   0]],

       [[234,  16,   0],
        [ 75, 194,   1],
        [  0,   0,   0]]])
Coordinates:
  * lead                          (lead) int32 1 2
    observations_category_bounds  (observations_category) <U11 '[-0.5, 0.0)' ...
    forecasts_category_bounds     (forecasts_category) <U11 '[-0.5, 0.0)' ......
  * observations_category         (observations_category) int64 1 2 3
  * forecasts_category            (forecasts_category) int64 1 2 3
    skill                         <U11 'initialized'
>>> # contingency-based dichotomous accuracy score
>>> category_edges = np.array([9.5, 10.0, 10.5])
>>> PerfectModelEnsemble.verify(metric='contingency', score='hit_rate',
...     comparison='m2c', dim=['member','init'],
...     observation_category_edges=category_edges,
...     forecast_category_edges=category_edges)
<xarray.Dataset>
Dimensions:  (lead: 20)
Coordinates:
  * lead     (lead) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Data variables:
    tos      (lead) float64 1.0 1.0 1.0 1.0 0.9091 ... 1.0 1.0 1.0 nan 1.0

Receiver Operating Characteristic

# Enter any of the below keywords in ``metric=...`` for the compute functions.
In [36]: print(f"\n\nKeywords: {metric_aliases['roc']}")


Keywords: ['roc']
climpred.metrics._roc(forecast, verif, dim=None, **metric_kwargs)[source]

Receiver Operating Characteristic.

Parameters
  • observations (xarray.object) – Labeled array(s) over which to apply the function. If bin_edges=='continuous', observations are binary.

  • forecasts (xarray.object) – Labeled array(s) over which to apply the function. If bin_edges=='continuous', forecasts are probabilities.

  • dim (str, list of str) – The dimension(s) over which to aggregate. Defaults to None, meaning aggregation over all dims other than lead.

  • logical (callable, optional) – Function with bool result to be applied to verification data and forecasts and then mean('member') to get forecasts and verification data in interval [0,1]. Passed via metric_kwargs.

  • bin_edges (array_like, str) – Bin edges for categorising observations and forecasts. Similar to np.histogram, all but the last (righthand-most) bin include the left edge and exclude the right edge. The last bin includes both edges. bin_edges will be sorted in ascending order. If bin_edges=='continuous', calculate bin_edges from forecasts, equal to sklearn.metrics.roc_curve(f_boolean, o_prob). Passed via metric_kwargs. Defaults to ‘continuous’.

  • drop_intermediate (bool) – Whether to drop some suboptimal thresholds which would not appear on a plotted ROC curve. This is useful in order to create lighter ROC curves. Defaults to False. Defaults to True in sklearn.metrics.roc_curve. Passed via metric_kwargs.

  • return_results (str) –

    Passed via metric_kwargs. Defaults to ‘area’. Specify how return is structed:

    • ’area’: return only the area under curve of ROC

    • ’all_as_tuple’: return true positive rate and false positive rate at each bin and area under the curve of ROC as tuple

    • ’all_as_metric_dim’: return true positive rate and false positive rate at each bin and area under curve of ROC concatinated into new metric dimension

Returns

reduced by dimensions dim, see return_results

parameter. true positive rate and false positive rate contain probability_bin dimension with ascending bin_edges as coordinates.

Return type

roc (xr.object)

Details:

minimum

0.0

maximum

1.0

perfect

1.0

orientation

positive

Example

>>> bin_edges = np.array([-0.5, 0.0, 0.5, 1.0])
>>> HindcastEnsemble.verify(metric='roc', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs',
...     bin_edges=bin_edges,
...     ).SST
<xarray.DataArray 'SST' (lead: 10)>
array([0.84385185, 0.82841667, 0.81358547, 0.8393463 , 0.82551752,
       0.81987778, 0.80719573, 0.80081909, 0.79046553, 0.78037564])
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'initialized'

Get area under the curve, false positive rate and true positive rate as metric dimension by specifying return_results='all_as_metric_dim':

>>> def f(ds): return ds > 0
>>> HindcastEnsemble.map(f).verify(metric='roc', comparison='m2o',
...     dim=['member', 'init'], alignment='same_verifs',
...     bin_edges='continuous', return_results='all_as_metric_dim'
...     ).SST.isel(lead=[0, 1])
<xarray.DataArray 'SST' (lead: 2, metric: 3, probability_bin: 3)>
array([[[0.        , 0.116     , 1.        ],
        [0.        , 0.8037037 , 1.        ],
        [0.84385185, 0.84385185, 0.84385185]],

       [[0.        , 0.064     , 1.        ],
        [0.        , 0.72222222, 1.        ],
        [0.82911111, 0.82911111, 0.82911111]]])
Coordinates:
  * probability_bin  (probability_bin) float64 2.0 1.0 0.0
  * lead             (lead) int32 1 2
  * metric           (metric) <U19 'false positive rate' ... 'area under curve'
    skill            <U11 'initialized'

User-defined metrics

You can also construct your own metrics via the climpred.metrics.Metric class.

Metric(name, function, positive, ...[, ...])

Master class for all metrics.

First, write your own metric function, similar to the existing ones with required arguments forecast, observations, dim=None, and **metric_kwargs:

from climpred.metrics import Metric

def _my_msle(forecast, observations, dim=None, **metric_kwargs):
    """Mean squared logarithmic error (MSLE).
    https://peltarion.com/knowledge-center/documentation/modeling-view/build-an-ai-model/loss-functions/mean-squared-logarithmic-error."""
    # function
    return ( (np.log(forecast + 1) + np.log(observations + 1) ) ** 2).mean(dim)

Then initialize this metric function with climpred.metrics.Metric:

_my_msle = Metric(
    name='my_msle',
    function=_my_msle,
    probabilistic=False,
    positive=False,
    unit_power=0,
    )

Finally, compute skill based on your own metric:

skill = hindcast.verify(metric=_my_msle, comparison='e2o', alignment='same_verif', dim='init')

Once you come up with an useful metric for your problem, consider contributing this metric to climpred, so all users can benefit from your metric, see Contributing to xarray.

References

Jolliffe2011(1,2)

Ian T. Jolliffe and David B. Stephenson. Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley & Sons, Ltd, Chichester, UK, December 2011. ISBN 978-1-119-96000-3 978-0-470-66071-3. URL: http://doi.wiley.com/10.1002/9781119960003.

Murphy1988

Allan H. Murphy. Skill Scores Based on the Mean Square Error and Their Relationships to the Correlation Coefficient. Monthly Weather Review, 116(12):2417–2424, December 1988. https://doi.org/10/fc7mxd.