You can run this notebook in a live session binder badge or view it on Github.

Bias Removal#

Climate models can have biases relative to different verification datasets. Commonly, biases are removed by postprocessing before verification of forecasting skill [Hawkins et al., 2016]. climpred provides convenience functions to do so.


The select from following methods by specifying how in HindcastEnsemble.remove_bias():

  • how="additive_mean": correcting the mean forecast additively

  • how="multiplicative_mean": correcting the mean forecast multiplicatively

  • how="multiplicative_std": correcting the standard deviation multiplicatively

Wrapped from the github package bias_correction:

Wrapped from xlim.sdba:

# linting
%load_ext nb_black
%load_ext lab_black
import climpred
import xarray as xr
import matplotlib.pyplot as plt
from climpred import HindcastEnsemble
initialized = climpred.tutorial.load_dataset("NMME_hindcast_Nino34_sst")
obs = climpred.tutorial.load_dataset("NMME_OIv2_Nino34_sst")
v = "sst"
hindcast = HindcastEnsemble(
    initialized.sel(model="GFDL-CM2p5-FLOR-A06")
).add_observations(obs)
hindcast.plot()
/Users/aaron.spring/Coding/climpred/climpred/checks.py:202: UserWarning: Did not find dimension "init", but renamed dimension S with CF-complying standard_name "forecast_reference_time" to init.
  warnings.warn(
/Users/aaron.spring/Coding/climpred/climpred/checks.py:202: UserWarning: Did not find dimension "member", but renamed dimension M with CF-complying standard_name "realization" to member.
  warnings.warn(
/Users/aaron.spring/Coding/climpred/climpred/checks.py:202: UserWarning: Did not find dimension "lead", but renamed dimension L with CF-complying standard_name "forecast_period" to lead.
  warnings.warn(
<AxesSubplot:xlabel='validity time', ylabel='Sea Surface Temperature\n[Celsius_scale]'>
_images/d533df0af641e9bf0f0adeb1a1f05be56946b3270fb23aa1e3a7e4bd9d2f43e7.png

Additive mean bias removal#

Typically, bias depends on lead-time and therefore should therefore also be removed depending on lead.

bias = hindcast.verify(
    metric="additive_bias", comparison="e2o", dim=[], alignment="same_verifs"
)

bias[v].plot()
<matplotlib.collections.QuadMesh at 0x143700d60>
_images/3d5068fb1257f8a76c54318d5694b3a683782109447b7ad6049f142a508544c1.png
# group bias by seasonality
seasonality = climpred.options.OPTIONS["seasonality"]
seasonality
'month'
bias.groupby(f"init.{seasonality}").mean()[v].plot()
<matplotlib.collections.QuadMesh at 0x1436fbd90>
_images/272e146d662ba5f933c04f0c0c5bb39dfe8676e486755dc45a4fc8406aa0c216.png

An initial warm bias develops into a cold bias, especially in winter.

train_test_split#

Risbey et al. [2021] demonstrate how important a clean separation of a train and a test period is for bias reduction.

Implemented train_test_splits in climpred:

  • unfair: completely overlapping train and test (climpred default)

  • unfair-cv: overlapping train and test except for current init, which is left out (set cv="LOO")

  • fair: no overlap between train and test (recommended)

metric_kwargs = dict(
    metric="rmse", alignment="same_verifs", dim="init", comparison="e2o", skipna=True
)
# fair calculates bias for train_time/train_init and drops these data from hindcast
hindcast.remove_bias(
    how="additive_mean",
    alignment=metric_kwargs["alignment"],
    train_test_split="fair",
    train_time=slice("1982", "1998"),
).plot()
<AxesSubplot:xlabel='validity time', ylabel='Sea Surface Temperature\n[Celsius_scale]'>
_images/edd6e9a18d0d56eb5af14f2cbe84267c7d8d7a564976fdaf1d058b6a48ea37fe.png
hindcast.verify(**metric_kwargs)[v].plot(label="no bias correction")
train_test_split = ["unfair", "unfair-cv", "fair"]

hindcast.remove_bias(
    how="additive_mean", alignment=metric_kwargs["alignment"], train_test_split="unfair"
).verify(**metric_kwargs)[v].plot(label="unfair")

hindcast.remove_bias(
    how="additive_mean",
    alignment=metric_kwargs["alignment"],
    train_test_split="unfair-cv",
    cv="LOO",
).verify(**metric_kwargs)[v].plot(label="unfair-cv")

hindcast.remove_bias(
    how="additive_mean",
    alignment=metric_kwargs["alignment"],
    train_test_split="fair",
    train_time=slice("1982", "1998"),
).verify(**metric_kwargs)[v].plot(label="fair")


plt.legend()
plt.title(
    f"NMME {hindcast.coords['model'].values} Nino3.4 SST {metric_kwargs['metric'].upper()}"
)
Text(0.5, 1.0, 'NMME GFDL-CM2p5-FLOR-A06 Nino3.4 SST RMSE')
_images/7ed36fe55b8e9967fa62c35dbfa5ea42a93e049c8db8cdbeb1b8bcb0e1a2fcfb.png

Comparison of methods how#

methods = [
    "additive_mean",
    # "multiplicative_std",
    "DetrendedQuantileMapping",
    "EmpiricalQuantileMapping",
    # "PrincipalComponents",
    # "LOCI",
    "QuantileDeltaMapping",
    "Scaling",
    "modified_quantile",
    "basic_quantile",
    # "gamma_mapping",
    # "normal_mapping",
]
import warnings

warnings.simplefilter("ignore")
# xclim.sdba requires pint units
hindcast._datasets["initialized"][v].attrs["units"] = "C"
hindcast._datasets["observations"][v].attrs["units"] = "C"
hindcast.sel(init=slice("1983", "2018")).verify(**metric_kwargs)[v].plot(label="raw")
for method in methods:
    hindcast.remove_bias(
        how=method, alignment=metric_kwargs["alignment"], train_test_split="unfair"
    ).verify(**metric_kwargs)[v].plot(label=method)
plt.legend()
<matplotlib.legend.Legend at 0x142093d60>
_images/4db7b419828a5c355d0cd6135c2331e6577e8fd50893030e636610d5f428dd81.png

References#

[1]

Ed Hawkins, Steffen Tietsche, Jonathan J. Day, Nathanael Melia, Keith Haines, and Sarah Keeley. Aspects of designing and evaluating seasonal-to-interannual Arctic sea-ice prediction systems. Quarterly Journal of the Royal Meteorological Society, 142(695):672–683, January 2016. doi:10/gfb3pn.

[2]

James S. Risbey, Dougal T. Squire, Amanda S. Black, Timothy DelSole, Chiara Lepore, Richard J. Matear, Didier P. Monselesan, Thomas S. Moore, Doug Richardson, Andrew Schepen, Michael K. Tippett, and Carly R. Tozer. Standard assessments of climate forecast skill can be misleading. Nature Communications, 12(1):4346, July 2021. doi:10/gk8k7k.