You can run this notebook in a live session or view it on Github.
Hindcast Predictions of Equatorial Pacific SSTs¶
In this example, we evaluate hindcasts (retrospective forecasts) with HindcastEnsemble
of sea surface temperatures in the eastern equatorial Pacific from CESM-DPLE. These hindcasts are evaluated against a forced ocean-sea ice simulation that initializes the model.
See the quick start for an analysis of time series (rather than maps) from HindcastEnsemble
.
# linting
%load_ext nb_black
%load_ext lab_black
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import climpred
from climpred import HindcastEnsemble
import xarray as xr
xr.set_options(display_style="text")
# silence warnings if annoying
# import warnings
# warnings.filterwarnings("ignore")
<xarray.core.options.set_options at 0x10d603d30>
We’ll load in a small region of the eastern equatorial Pacific for this analysis example.
initialized = climpred.tutorial.load_dataset("CESM-DP-SST-3D")["SST"]
recon = climpred.tutorial.load_dataset("FOSI-SST-3D")["SST"]
initialized
<xarray.DataArray 'SST' (init: 64, lead: 10, nlat: 37, nlon: 26)> [615680 values with dtype=float32] Coordinates: TLAT (nlat, nlon) float64 ... TLONG (nlat, nlon) float64 ... * init (init) float32 1.954e+03 1.955e+03 ... 2.016e+03 2.017e+03 * lead (lead) int32 1 2 3 4 5 6 7 8 9 10 TAREA (nlat, nlon) float64 ... Dimensions without coordinates: nlat, nlon
These two example products cover a small portion of the eastern equatorial Pacific.
It is generally advisable to do all bias correction before instantiating a HindcastEnsemble
. However, HindcastEnsemble
can also be modified with HindcastEnsemble.remove_bias()
.
climpred
requires that lead
dimension has an attribute called units
indicating what time units the lead
is assocated with. Options are: years, seasons, months, weeks, pentads, days
, hours
, minutes
, seconds
. For the this data, the lead
units
are years
.
initialized["lead"].attrs["units"] = "years"
hindcast = HindcastEnsemble(initialized).add_observations(recon)
hindcast
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
warnings.warn(
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
warnings.warn(
<climpred.HindcastEnsemble>
Initialized Ensemble:
SST (init, lead, nlat, nlon) float32 -0.3323 -0.3238 ... 1.179 1.123
Observations:
SST (time, nlat, nlon) float32 25.53 25.43 25.35 ... 27.03 27.1 27.05
Uninitialized:
None
# Quick look at the HindEnsemble timeseries: only works for 1-dimensional data, therefore take spatial mean
hindcast.mean(["nlat", "nlon"]).plot()
<AxesSubplot:xlabel='validity time', ylabel='SST'>
We first need to remove the same climatology that was used to drift-correct the CESM-DPLE.
recon = recon - recon.sel(time=slice(1964, 2014)).mean("time")
hindcast = hindcast.add_observations(recon)
hindcast.mean(["nlat", "nlon"]).plot()
/Users/aaron.spring/Coding/climpred/climpred/utils.py:191: UserWarning: Assuming annual resolution starting Jan 1st due to numeric inits. Please change ``init`` to a datetime if it is another resolution. We recommend using xr.CFTimeIndex as ``init``, see https://climpred.readthedocs.io/en/stable/setting-up-data.html.
warnings.warn(
<AxesSubplot:xlabel='validity time', ylabel='SST'>
We’ll also detrend climpred.stats.rm_trend()
the reconstruction over its time
dimension and initialized forecast ensemble over lead
.
from climpred.stats import rm_trend
hindcast_detrend = hindcast.map(rm_trend, dim="lead_or_time")
hindcast_detrend.mean(["nlat", "nlon"]).plot()
<AxesSubplot:xlabel='validity time', ylabel='SST'>
Anomaly Correlation Coefficient of SSTs¶
We can now compute the ACC climpred.metrics._pearson_r()
over all leads and all grid cells with HindcastEnsemble.verify
.
predictability = hindcast_detrend.verify(
metric="acc", comparison="e2o", dim="init", alignment="same_verif"
)
predictability
<xarray.Dataset> Dimensions: (lead: 10, nlat: 37, nlon: 26) Coordinates: * nlat (nlat) int64 0 1 2 3 4 5 6 7 8 9 ... 27 28 29 30 31 32 33 34 35 36 * nlon (nlon) int64 0 1 2 3 4 5 6 7 8 9 ... 16 17 18 19 20 21 22 23 24 25 TLONG (nlat, nlon) float64 250.8 251.9 253.1 254.2 ... 276.7 277.8 278.9 * lead (lead) int32 1 2 3 4 5 6 7 8 9 10 TAREA (nlat, nlon) float64 3.661e+13 3.661e+13 ... 3.714e+13 3.714e+13 TLAT (nlat, nlon) float64 -9.75 -9.75 -9.75 ... -0.1336 -0.1336 -0.1336 skill <U11 'initialized' Data variables: SST (lead, nlat, nlon) float64 0.6272 0.6248 ... -0.05417 -0.05513 Attributes: prediction_skill_software: climpred https://climpred.readthedocs.io/ skill_calculated_by_function: HindcastEnsemble.verify() number_of_initializations: 64 alignment: same_verif metric: pearson_r comparison: e2o dim: init reference: []
We use the pval
keyword to get associated p-values for our ACCs. We can then mask our final maps based on $\alpha = 0.05$.
significance = hindcast_detrend.verify(
metric="p_pval", comparison="e2o", dim="init", alignment="same_verif"
)
# Mask latitude and longitude by significance for stippling.
siglat = significance.TLAT.where(significance.SST <= 0.05)
siglon = significance.TLONG.where(significance.SST <= 0.05)
p = predictability.SST.plot.pcolormesh(
x="TLONG",
y="TLAT",
col="lead",
col_wrap=5,
cbar_kwargs={"label": "Anomaly Correlation Coefficient"},
vmin=-0.7,
vmax=0.7,
cmap="RdYlBu_r",
)
for i, ax in enumerate(p.axes.flat):
# Add significance stippling
ax.scatter(
siglon.isel(lead=i), siglat.isel(lead=i), color="k", marker=".", s=1.5,
)
Root Mean Square Error of SSTs¶
We can also check error in our forecasts, just by changing the metric keyword to _rmse()
or _mae()
.
rmse = hindcast_detrend.verify(
metric="rmse", comparison="e2o", dim="init", alignment="same_verif"
)
p = rmse.SST.plot.pcolormesh(
x="TLONG",
y="TLAT",
col="lead",
col_wrap=5,
cbar_kwargs={"label": "Root Mean Square Error (degC)"},
cmap="Purples",
)