Hindcast Predictions of Equatorial Pacific SSTs¶
In this example, we evaluate hindcasts (retrospective forecasts) of sea surface temperatures in the eastern equatorial Pacific from CESM-DPLE. These hindcasts are evaluated against a forced ocean–sea ice simulation that initializes the model.
See the quick start for an analysis of time series (rather than maps) from a hindcast prediction ensemble.
[1]:
# linting
%load_ext nb_black
%load_ext lab_black
[2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import climpred
from climpred import HindcastEnsemble
import xarray as xr
xr.set_options(display_style="text")
# silence warnings if annoying
# import warnings
# warnings.filterwarnings("ignore")
[2]:
<xarray.core.options.set_options at 0x7f896ab8d8e0>
We’ll load in a small region of the eastern equatorial Pacific for this analysis example.
[3]:
hind = climpred.tutorial.load_dataset("CESM-DP-SST-3D")["SST"]
recon = climpred.tutorial.load_dataset("FOSI-SST-3D")["SST"]
print(hind)
<xarray.DataArray 'SST' (init: 64, lead: 10, nlat: 37, nlon: 26)>
[615680 values with dtype=float32]
Coordinates:
TLAT (nlat, nlon) float64 ...
TLONG (nlat, nlon) float64 ...
* init (init) float32 1.954e+03 1.955e+03 ... 2.016e+03 2.017e+03
* lead (lead) int32 1 2 3 4 5 6 7 8 9 10
TAREA (nlat, nlon) float64 ...
Dimensions without coordinates: nlat, nlon
These two example products cover a small portion of the eastern equatorial Pacific.
It is generally advisable to do all bias correction before instantiating a HindcastEnsemble
. However, HindcastEnsemble
can also be modified.
climpred
requires that lead
dimension has an attribute called units
indicating what time units the lead
is assocated with. Options are: years, seasons, months, weeks, pentads, days
. For the this data, the lead
units
are years
.
[4]:
hind["lead"].attrs["units"] = "years"
[5]:
hindcast = HindcastEnsemble(hind)
hindcast = hindcast.add_observations(recon)
print(hindcast)
<climpred.HindcastEnsemble>
Initialized Ensemble:
SST (init, lead, nlat, nlon) float32 ...
Observations:
SST (time, nlat, nlon) float32 ...
Uninitialized:
None
/Users/aaron.spring/Coding/climpred/climpred/utils.py:122: UserWarning: Assuming annual resolution due to numeric inits. Change init to a datetime if it is another resolution.
warnings.warn(
[6]:
# Quick look at the HindEnsemble timeseries: only works for 1-dimensional data, therefore take spatial mean
hindcast.mean(["nlat", "nlon"]).plot()
[6]:
<AxesSubplot:xlabel='time', ylabel='SST'>
We first need to remove the same climatology that was used to drift-correct the CESM-DPLE.
[7]:
recon = recon - recon.sel(time=slice(1964, 2014)).mean("time")
hindcast = hindcast.add_observations(recon)
hindcast.mean(["nlat", "nlon"]).plot()
/Users/aaron.spring/Coding/climpred/climpred/utils.py:122: UserWarning: Assuming annual resolution due to numeric inits. Change init to a datetime if it is another resolution.
warnings.warn(
[7]:
<AxesSubplot:xlabel='time', ylabel='SST'>
We’ll also detrend the reconstruction over its time
dimension and initialized forecast ensemble over lead
.
[8]:
from climpred.stats import rm_trend
hindcast_detrend = hindcast.map(rm_trend, dim="lead_time")
/Users/aaron.spring/Coding/climpred/climpred/classes.py:375: UserWarning: Error due to initialized: rm_trend({'dim': 'lead_time'}) failed
KeyError: 'lead_time'
warnings.warn(f"Error due to initialized: {msg}")
/Users/aaron.spring/Coding/climpred/climpred/classes.py:380: UserWarning: Error due to verification/control/uninitialized: rm_trend({'dim': 'lead_time'}) failed
KeyError: 'lead_time'
warnings.warn(
[9]:
hindcast_detrend.mean(["nlat", "nlon"]).plot()
[9]:
<AxesSubplot:xlabel='time', ylabel='SST'>
Although functions can be called directly in climpred
, we suggest that you use our classes (HindcastEnsemble
and PerfectModelEnsemble
) to make analysis code cleaner.
Anomaly Correlation Coefficient of SSTs¶
We can now compute the ACC over all leads and all grid cells.
[10]:
predictability = hindcast_detrend.verify(
metric="acc", comparison="e2o", dim="init", alignment="same_verif"
)
print(predictability)
<xarray.Dataset>
Dimensions: (lead: 10, nlat: 37, nlon: 26)
Coordinates:
* lead (lead) int32 1 2 3 4 5 6 7 8 9 10
TLONG (nlat, nlon) float64 250.8 251.9 253.1 254.2 ... 276.7 277.8 278.9
TAREA (nlat, nlon) float64 3.661e+13 3.661e+13 ... 3.714e+13 3.714e+13
TLAT (nlat, nlon) float64 -9.75 -9.75 -9.75 ... -0.1336 -0.1336 -0.1336
skill <U11 'initialized'
Dimensions without coordinates: nlat, nlon
Data variables:
SST (lead, nlat, nlon) float32 0.504 0.5012 0.4989 ... 0.09155 0.1641
We use the pval
keyword to get associated p-values for our ACCs. We can then mask our final maps based on .
[11]:
significance = hindcast_detrend.verify(
metric="p_pval", comparison="e2o", dim="init", alignment="same_verif"
)
# Mask latitude and longitude by significance for stippling.
siglat = significance.TLAT.where(significance.SST <= 0.05)
siglon = significance.TLONG.where(significance.SST <= 0.05)
[12]:
p = predictability.SST.plot.pcolormesh(
x="TLONG",
y="TLAT",
col="lead",
col_wrap=5,
cbar_kwargs={"label": "Anomaly Correlation Coefficient"},
vmin=-0.7,
vmax=0.7,
cmap="RdYlBu_r",
)
for i, ax in enumerate(p.axes.flat):
# Add significance stippling
ax.scatter(
siglon.isel(lead=i), siglat.isel(lead=i), color="k", marker=".", s=1.5,
)
Root Mean Square Error of SSTs¶
We can also check error in our forecasts, just by changing the metric keyword.
[13]:
rmse = hindcast_detrend.verify(
metric="rmse", comparison="e2o", dim="init", alignment="same_verif"
)
[14]:
p = rmse.SST.plot.pcolormesh(
x="TLONG",
y="TLAT",
col="lead",
col_wrap=5,
cbar_kwargs={"label": "Root Mean Square Error (degC)"},
cmap="Purples",
)