Quick Start

The easiest way to get up and running is to load in one of our example datasets (or load in some data of your own) and to convert them to either a HindcastEnsemble or PerfectModelEnsemble object.

climpred provides example datasets from the MPI-ESM-LR decadal prediction ensemble and the CESM decadal prediction ensemble. See our examples to see some analysis cases.

[1]:
%matplotlib inline
import matplotlib.pyplot as plt
import xarray as xr

from climpred import HindcastEnsemble
from climpred.tutorial import load_dataset
import climpred

xr.set_options(display_style='text')
[1]:
<xarray.core.options.set_options at 0x7fdcb0da9ba8>

You can view the datasets available to be loaded with the load_datasets() command without passing any arguments:

[2]:
load_dataset()
'MPI-control-1D': area averages for the MPI control run of SST/SSS.
'MPI-control-3D': lat/lon/time for the MPI control run of SST/SSS.
'MPI-PM-DP-1D': perfect model decadal prediction ensemble area averages of SST/SSS/AMO.
'MPI-PM-DP-3D': perfect model decadal prediction ensemble lat/lon/time of SST/SSS/AMO.
'CESM-DP-SST': hindcast decadal prediction ensemble of global mean SSTs.
'CESM-DP-SSS': hindcast decadal prediction ensemble of global mean SSS.
'CESM-DP-SST-3D': hindcast decadal prediction ensemble of eastern Pacific SSTs.
'CESM-LE': uninitialized ensemble of global mean SSTs.
'MPIESM_miklip_baseline1-hind-SST-global': hindcast initialized ensemble of global mean SSTs
'MPIESM_miklip_baseline1-hist-SST-global': uninitialized ensemble of global mean SSTs
'MPIESM_miklip_baseline1-assim-SST-global': assimilation in MPI-ESM of global mean SSTs
'ERSST': observations of global mean SSTs.
'FOSI-SST': reconstruction of global mean SSTs.
'FOSI-SSS': reconstruction of global mean SSS.
'FOSI-SST-3D': reconstruction of eastern Pacific SSTs
'GMAO-GEOS-RMM1': daily RMM1 from the GMAO-GEOS-V2p1 model for SubX
'RMM-INTERANN-OBS': observed RMM with interannual variablity included

From here, loading a dataset is easy. Note that you need to be connected to the internet for this to work – the datasets are being pulled from the climpred-data repository. Once loaded, it is cached on your computer so you can reload extremely quickly. These datasets are very small (< 1MB each) so they won’t take up much space.

[3]:
hind = climpred.tutorial.load_dataset('CESM-DP-SST')
# Add lead attribute units.
hind["lead"].attrs["units"] = "years"
obs = climpred.tutorial.load_dataset('ERSST')

Make sure your prediction ensemble’s dimension labeling conforms to climpred’s standards. In other words, you need an init, lead, and (optional) member dimension. Make sure that your init and lead dimensions align. E.g., a November 1st, 1954 initialization should be labeled as init=1954 so that the lead=1 forecast is 1955.

[4]:
print(hind.coords)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
  * member   (member) int32 1 2 3 4 5 6 7 8 9 10
  * init     (init) float32 1954.0 1955.0 1956.0 1957.0 ... 2015.0 2016.0 2017.0

We’ll quickly process the data to create anomalies. CESM-DPLE’s drift-correction occurs over 1964-2014, so we’ll remove that from the observations.

[5]:
obs = obs - obs.sel(time=slice(1964,2014)).mean('time')

We can create a HindcastEnsemble object and add our observations.

[6]:
hindcast = HindcastEnsemble(hind)
hindcast = hindcast.add_observations(obs)
print(hindcast)
<climpred.HindcastEnsemble>
Initialized Ensemble:
    SST      (init, lead, member) float64 ...
Observations:
    SST      (time) float32 -0.40146065 -0.35238647 ... 0.34601402 0.45021248
Uninitialized:
    None
/Users/rileybrady/Desktop/dev/climpred/climpred/utils.py:141: UserWarning: Assuming annual resolution due to numeric inits. Change init to a datetime if it is another resolution.
  "Assuming annual resolution due to numeric inits. "

PredictionEnsemble.plot() shows all associated datasets.

[7]:
hindcast.plot()
[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fdcb81b07f0>
_images/quick-start_13_1.png

We’ll also remove a quadratic trend so that it doesn’t artificially boost our predictability. PredictionEnsemble.map(func) tries to apply/map func to all associated datasets.

[8]:
from esmtools.stats import rm_poly

hindcast = hindcast.map(rm_poly, dim='init', order=2).map(rm_poly, dim='time', order=2)
hindcast.plot()
[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fdcb8521908>
_images/quick-start_15_1.png

Now we’ll quickly calculate skill against persistence. We require users to define metric, comparison, dim, and alignment. This ensures that climpred isn’t treated like a black box – there are no “defaults” to the prediction analysis framework. You can choose from a variety of possible metrics by entering their associated strings. Comparison strategies vary for hindcast and perfect model systems. Here we chose to compare the ensemble mean to observations ('e2o'). We reduce this operation over the initialization dimension. Lastly, we choose the 'same_verif' alignment, which uses the same set of verification dates across all leads (see alignment strategies here).

An optional keyword used here is reference. Here, we ask to compute the 'acc' metric with a persistence forecast, so that we can establish skill over some baseline forecast.

[9]:
result = hindcast.verify(metric='acc', comparison='e2o', dim='init', alignment='same_verif', reference='persistence')
skill = result.sel(skill='initialized')
persistence = result.sel(skill='persistence')
print(skill)
<xarray.Dataset>
Dimensions:  (lead: 10)
Coordinates:
  * lead     (lead) int32 1 2 3 4 5 6 7 8 9 10
    skill    <U11 'init'
Data variables:
    SST      (lead) float64 0.6387 0.4999 0.401 0.4648 ... 0.404 0.222 0.1151
[10]:
plt.style.use('fivethirtyeight')
f, ax = plt.subplots(figsize=(8, 3))
skill.SST.plot(marker='o', markersize=10, label='skill')
persistence.SST.plot(marker='o', markersize=10, label='persistence',
                     color='#a9a9a9')
plt.legend()
ax.set(title='Global Mean SST Predictability',
       ylabel='Anomaly \n Correlation Coefficient',
       xlabel='Lead Year')
plt.show()
_images/quick-start_18_0.png

We can also check accuracy (error) of our forecasts.

[11]:
result = hindcast.verify(metric='rmse', comparison='e2o', dim='init', alignment='same_verif', reference='persistence')
skill = result.sel(skill='initialized')
persistence = result.sel(skill='persistence')
[12]:
plt.style.use('fivethirtyeight')
f, ax = plt.subplots(figsize=(8, 3))
skill.SST.plot(marker='o', markersize=10, label='initialized forecast')
persistence.SST.plot(marker='o', markersize=10, label='persistence',
                     color='#a9a9a9')
plt.legend()
ax.set(title='Global Mean SST Forecast Error',
       ylabel='RMSE',
       xlabel='Lead Year')
plt.show()
_images/quick-start_21_0.png