PerfectModelEnsemble.bootstrap(metric=None, comparison=None, dim=None, reference=None, iterations=None, sig=95, pers_sig=None, **metric_kwargs)[source]

Bootstrap with replacement according to Goddard et al. 2013.

  • metric (str, Metric) – Metric to verify bootstrapped skill, see metrics.

  • comparison (str, Comparison) – Comparison passed to verify, see comparisons.

  • dim (str, list of str) – Dimension(s) over which to apply metric. dim is passed on to xskillscore.{metric} and includes xskillscore’s member_dim. dim should contain member when comparison is probabilistic but should not contain member when comparison=e2c. Defaults to None meaning that all dimensions other than lead are reduced.

  • reference (str, list of str) – Type of reference forecasts with which to verify. One or more of [‘uninitialized’, ‘persistence’, ‘climatology’]. If None or empty, returns no p value.

  • iterations (int) – Number of resampling iterations for bootstrapping with replacement. Recommended >= 500.

  • sig (int, default 95) – Significance level in percent for deciding whether uninitialized and persistence beat initialized skill.

  • pers_sig (int) – If not None, the separate significance level for persistence. Defaults to None, or the same significance as sig.

  • **metric_kwargs (optional) – arguments passed to metric.


with dimensions results (holding verify skill, p, low_ci and high_ci) and skill (holding initialized, persistence and/or uninitialized):

  • results=’verify skill’, skill=’initialized’:

    mean initialized skill

  • results=’high_ci’, skill=’initialized’:

    high confidence interval boundary for initialized skill

  • results=’p’, skill=’uninitialized’:

    p value of the hypothesis that the difference of skill between the initialized and uninitialized simulations is smaller or equal to zero based on bootstrapping with replacement.

  • results=’p’, skill=’persistence’:

    p value of the hypothesis that the difference of skill between the initialized and persistenceistence simulations is smaller or equal to zero based on bootstrapping with replacement.

Return type


  • Goddard, L., A. Kumar, A. Solomon, D. Smith, G. Boer, P. Gonzalez, V. Kharin, et al. “A Verification Framework for Interannual-to-Decadal Predictions Experiments.” Climate Dynamics 40, no. 1–2 (January 1, 2013): 245–72.


Calculate the Pearson’s Anomaly Correlation (‘acc’) comparing every member to every other member (m2m) reducing dimensions member and init 50 times after resampling member dimension with replacement. Also calculate reference skill for the persistence, climatology and uninitialized forecast and compare whether initialized skill is better than reference skill: Returns verify skill, probability that reference forecast performs better than initialized and the lower and upper bound of the resample.

>>> PerfectModelEnsemble.bootstrap(metric='acc', comparison='m2m',
...     dim=['init', 'member'], iterations=50, resample_dim='member',
...     reference=['persistence', 'climatology' ,'uninitialized'])
Dimensions:  (skill: 4, results: 4, lead: 20)
  * lead     (lead) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  * results  (results) <U12 'verify skill' 'p' 'low_ci' 'high_ci'
  * skill    (skill) <U13 'initialized' 'persistence' ... 'uninitialized'
Data variables:
    tos      (skill, results, lead) float64 0.7941 0.7489 ... 0.1494 0.1466
    prediction_skill:            calculated by climpred
    number_of_initializations:   12
    number_of_members:           10
    alignment:                   same_verifs
    metric:                      pearson_r
    comparison:                  m2m
    dim:                         ['init', 'member']
    units:                       None
    confidence_interval_levels:  0.975-0.025
    bootstrap_iterations:        50
    p:                           probability that reference performs better t...