Hindcast Predictions of Equatorial Pacific SSTs

You can run this notebook in a live session binder badge or view it on Github.

Hindcast Predictions of Equatorial Pacific SSTs#

In this example, we evaluate hindcasts (retrospective forecasts) with HindcastEnsemble of sea surface temperatures in the eastern equatorial Pacific from CESM-DPLE. These hindcasts are evaluated against a forced ocean-sea ice simulation that initializes the model.

See the quick start for an analysis of time series (rather than maps) from HindcastEnsemble.

# linting
%load_ext nb_black
%load_ext lab_black
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

import climpred
from climpred import HindcastEnsemble
import xarray as xr

xr.set_options(display_style="text")

# silence warnings if annoying
# import warnings
# warnings.filterwarnings("ignore")
<xarray.core.options.set_options at 0x10d603d30>

We’ll load in a small region of the eastern equatorial Pacific for this analysis example.

initialized = climpred.tutorial.load_dataset("CESM-DP-SST-3D")["SST"]
recon = climpred.tutorial.load_dataset("FOSI-SST-3D")["SST"]
initialized
<xarray.DataArray 'SST' (init: 64, lead: 10, nlat: 37, nlon: 26)>
[615680 values with dtype=float32]
Coordinates:
    TLAT     (nlat, nlon) float64 ...
    TLONG    (nlat, nlon) float64 ...
  * init     (init) float32 1.954e+03 1.955e+03 ... 2.016e+03 2.017e+03
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    TAREA    (nlat, nlon) float64 ...
Dimensions without coordinates: nlat, nlon

These two example products cover a small portion of the eastern equatorial Pacific.

It is generally advisable to do all bias correction before instantiating a HindcastEnsemble. However, HindcastEnsemble can also be modified with HindcastEnsemble.remove_bias().

climpred requires that lead dimension has an attribute called units indicating what time units the lead is assocated with. Options are: years, seasons, months, weeks, pentads, days, hours, minutes, seconds. For the this data, the lead units are years.

initialized["lead"].attrs["units"] = "years"
hindcast = HindcastEnsemble(initialized).add_observations(recon)
hindcast
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
  warnings.warn(
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
  warnings.warn(
<climpred.HindcastEnsemble>
Initialized Ensemble:
    SST      (init, lead, nlat, nlon) float32 -0.3323 -0.3238 ... 1.179 1.123
Observations:
    SST      (time, nlat, nlon) float32 25.53 25.43 25.35 ... 27.03 27.1 27.05
Uninitialized:
    None
# Quick look at the HindEnsemble timeseries: only works for 1-dimensional data, therefore take spatial mean
hindcast.mean(["nlat", "nlon"]).plot()
<AxesSubplot:xlabel='validity time', ylabel='SST'>
../../_images/3a8b26fdb579cb778e08542af4b35f7908c78c9ce8d27c82df77abc3f50b37fa.png

We first need to remove the same climatology that was used to drift-correct the CESM-DPLE.

recon = recon - recon.sel(time=slice(1964, 2014)).mean("time")
hindcast = hindcast.add_observations(recon)
hindcast.mean(["nlat", "nlon"]).plot()
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
  warnings.warn(
<AxesSubplot:xlabel='validity time', ylabel='SST'>
../../_images/1188ef09c924d61e0764bb68b208e5af98ac235bad0b7effce1bceee4ec3fade.png

We’ll also detrend climpred.stats.rm_trend() the reconstruction over its time dimension and initialized forecast ensemble over lead.

from climpred.stats import rm_trend

hindcast_detrend = hindcast.map(rm_trend, dim="lead_or_time")
hindcast_detrend.mean(["nlat", "nlon"]).plot()
<AxesSubplot:xlabel='validity time', ylabel='SST'>
../../_images/389206c363388b296e36d08d4bd0f981cd7c81b4918db279b677958fc5a09ad2.png

Anomaly Correlation Coefficient of SSTs#

We can now compute the ACC climpred.metrics._pearson_r() over all leads and all grid cells with HindcastEnsemble.verify.

predictability = hindcast_detrend.verify(
    metric="acc", comparison="e2o", dim="init", alignment="same_verif"
)
predictability
<xarray.Dataset>
Dimensions:  (lead: 10, nlat: 37, nlon: 26)
Coordinates:
  * nlat     (nlat) int64 0 1 2 3 4 5 6 7 8 9 ... 27 28 29 30 31 32 33 34 35 36
  * nlon     (nlon) int64 0 1 2 3 4 5 6 7 8 9 ... 16 17 18 19 20 21 22 23 24 25
    TLONG    (nlat, nlon) float64 250.8 251.9 253.1 254.2 ... 276.7 277.8 278.9
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    TAREA    (nlat, nlon) float64 3.661e+13 3.661e+13 ... 3.714e+13 3.714e+13
    TLAT     (nlat, nlon) float64 -9.75 -9.75 -9.75 ... -0.1336 -0.1336 -0.1336
    skill    <U11 'initialized'
Data variables:
    SST      (lead, nlat, nlon) float64 0.6272 0.6248 ... -0.05417 -0.05513
Attributes:
    prediction_skill_software:     climpred https://climpred.readthedocs.io/
    skill_calculated_by_function:  HindcastEnsemble.verify()
    number_of_initializations:     64
    alignment:                     same_verif
    metric:                        pearson_r
    comparison:                    e2o
    dim:                           init
    reference:                     []

We use the pval keyword to get associated p-values for our ACCs. We can then mask our final maps based on \alpha = 0.05.

significance = hindcast_detrend.verify(
    metric="p_pval", comparison="e2o", dim="init", alignment="same_verif"
)

# Mask latitude and longitude by significance for stippling.
siglat = significance.TLAT.where(significance.SST <= 0.05)
siglon = significance.TLONG.where(significance.SST <= 0.05)
p = predictability.SST.plot.pcolormesh(
    x="TLONG",
    y="TLAT",
    col="lead",
    col_wrap=5,
    cbar_kwargs={"label": "Anomaly Correlation Coefficient"},
    vmin=-0.7,
    vmax=0.7,
    cmap="RdYlBu_r",
)
for i, ax in enumerate(p.axes.flat):
    # Add significance stippling
    ax.scatter(
        siglon.isel(lead=i), siglat.isel(lead=i), color="k", marker=".", s=1.5,
    )
../../_images/3a822407d89a921d6b76709f778263528f894e67fcd21f5dc51431aed92d46b9.png

Root Mean Square Error of SSTs#

We can also check error in our forecasts, just by changing the metric keyword to _rmse() or _mae().

rmse = hindcast_detrend.verify(
    metric="rmse", comparison="e2o", dim="init", alignment="same_verif"
)
p = rmse.SST.plot.pcolormesh(
    x="TLONG",
    y="TLAT",
    col="lead",
    col_wrap=5,
    cbar_kwargs={"label": "Root Mean Square Error (degC)"},
    cmap="Purples",
)
../../_images/881fa21a1d948f648d5cc6f2a5358c4f41faac72fb93c55569b66153d7d339e0.png