You can run this notebook in a live session binder badge or view it on Github.

Calculate skill of a MJO Index of SubX model GEOS_V2p1 as function of daily lead time

In this example, we demonstrate:

  1. How to remotely access data from the Subseasonal Experiment (SubX) hindcast database and set it up to be used in climpred.

  2. How to calculate the Anomaly Correlation Coefficient (ACC) using daily data with climpred

  3. How to calculate and plot historical forecast skill of the real-time multivariate MJO (RMM) indices as function of lead time.

The Subseasonal Experiment (SubX)

Further information on SubX is available from Pegion et al. [2019] and the SubX project website.

The SubX public database is hosted on the International Research Institute for Climate and Society (IRI) data server http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.

Since the SubX data server is accessed via this notebook, the time for the notebook to run may is several minutes and will vary depending on the speed that data can be downloaded. This is a large dataset, so please be patient. If you prefer to download SubX data locally, scripts are available from https://github.com/kpegion/SubX.

Definitions

RMM

Two indices (RMM1 and RMM2) are used to represent the MJO. Together they define the MJO based on 8 phases and can represent both the phase and amplitude of the MJO [Wheeler and Hendon, 2004]. This example uses the observed RMM1 provided by Matthew Wheeler at the Center for Australian Weather and Climate Research. It is the version of the indices in which interannual variability has not been removed.

Skill of RMM

Traditionally, the skill of the RMM is calculated as a bivariate correlation encompassing the skill of the two indices together [Gottschalck et al., 2010, Rashid et al., 2011]. Currently, climpred does not have the functionality to calculate the bivariate correlation, thus the anomaly correlation coefficient for RMM1 index is calculated here as a demonstration. The bivariate correlation metric will be added in a future version of climpred.

# linting
%load_ext nb_black
%load_ext lab_black
import warnings

import matplotlib.pyplot as plt

import xarray as xr
import pandas as pd
import numpy as np

from climpred import HindcastEnsemble
import climpred

warnings.filterwarnings("ignore")

Read the observed RMM Indices.

obsds = climpred.tutorial.load_dataset("RMM-INTERANN-OBS")[["rmm1"]].dropna(
    "time"
)  # Get rid of missing times.
obsds["rmm1"].plot()
[<matplotlib.lines.Line2D at 0x7f46198f7550>]
../../_images/daily-subx-example_9_1.png

Read the SubX RMM1 data for the GMAO-GEOS_V2p1 model from the SubX data server. It is important to note that the SubX data contains weekly initialized forecasts where the init day varies by model. SubX data may have all NaNs for initial dates in which a model does not make a forecast, thus we apply dropna over the S=init dimension when how=all data for a given S=init is missing. This can be slow, but allows the rest of the calculations to go more quickly.

Note that we ran the dropna operation offline and then uploaded the post-processed SubX dataset to the climpred-data repo for the purposes of this demo. This is how you can do this manually:

url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.GMAO/.GEOS_V2p1/.hindcast/.RMM/.RMM1/dods/'
fcstds = xr.open_dataset(url, decode_times=False, chunks={'S': 1, 'L': 45}).dropna(dim='S',how='all')
fcstds = climpred.tutorial.load_dataset("GMAO-GEOS-RMM1", decode_times=False)
fcstds
<xarray.Dataset>
Dimensions:  (S: 510, M: 4, L: 45)
Coordinates:
  * S        (S) float32 1.424e+04 1.425e+04 1.426e+04 ... 2.044e+04 2.045e+04
  * M        (M) float32 1.0 2.0 3.0 4.0
  * L        (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 ... 40.5 41.5 42.5 43.5 44.5
Data variables:
    RMM1     (S, M, L) float32 -0.03237 0.1051 0.1946 ... -0.1499 0.08326 0.1389
Attributes:
    Conventions:  IRIDL

The SubX data dimensions correspond to the following climpred dimension definitions: X=lon,L=lead,Y=lat,M=member, S=init. We will rename the dimensions to their climpred names.

fcstds = fcstds.rename({"S": "init", "L": "lead", "M": "member", "RMM1": "rmm1"})

Let’s make sure that the lead dimension is set properly for climpred. SubX data stores leads as 0.5, 1.5, 2.5, etc, which correspond to 0, 1, 2, … days since initialization. We will change the lead to be integers starting with zero.

fcstds["lead"] = (fcstds["lead"] - 0.5).astype("int")

Now we need to make sure that the init dimension is set properly for climpred. We use the xarray convenience function decode_cf to convert the numeric time into datetimes. It recognizes that this is on a 360 day calendar.

fcstds = xr.decode_cf(fcstds, decode_times=True)

climpred also requires that lead dimension has an attribute called units indicating what time units the lead is assocated with. Options are: years, seasons, months, weeks, pentads, days, hours, minutes, or seconds. For the SubX data, the lead units are days.

fcstds["lead"].attrs = {"units": "days"}

Create the HindcastEnsemble object and HindcastEnsemble.add_observations().

hindcast = HindcastEnsemble(fcstds).add_observations(obsds)
hindcast

climpred.HindcastEnsemble

<Initialized Ensemble>
Dimensions:     (init: 510, member: 4, lead: 45)
Coordinates:
  * init        (init) object 1999-01-01 00:00:00 ... 2015-12-27 00:00:00
  * member      (member) float32 1.0 2.0 3.0 4.0
  * lead        (lead) int64 0 1 2 3 4 5 6 7 8 9 ... 36 37 38 39 40 41 42 43 44
    valid_time  (lead, init) object 1999-01-01 00:00:00 ... 2016-02-09 00:00:00
Data variables:
    rmm1        (init, member, lead) float32 ...
Attributes:
    Conventions:  IRIDL
<Observations>
Dimensions:  (time: 15468)
Coordinates:
  * time     (time) object 1974-06-03 00:00:00 ... 2017-07-24 00:00:00
Data variables:
    rmm1     (time) float64 1.867 1.868 1.924 1.574 ... 0.8918 0.9023 1.028

Calculate the Anomaly Correlation Coefficient (ACC) climpred.metrics._pearson_r()

skill = hindcast.verify(
    metric="acc", comparison="e2o", dim="init", alignment="maximize"
)
skill
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
/tmp/ipykernel_2944/62867984.py in <module>
----> 1 skill = hindcast.verify(
      2     metric="acc", comparison="e2o", dim="init", alignment="maximize"
      3 )
      4 skill

~/checkouts/readthedocs.org/user_builds/climpred/checkouts/v2.2.0/climpred/classes.py in verify(self, metric, comparison, dim, alignment, reference, groupby, **metric_kwargs)
   2176             hist = None
   2177 
-> 2178         res = self._apply_climpred_function(
   2179             _verify,
   2180             init=True,

~/checkouts/readthedocs.org/user_builds/climpred/checkouts/v2.2.0/climpred/classes.py in _apply_climpred_function(self, func, init, **kwargs)
   1680         verif = self._datasets["observations"]
   1681         drop_init, drop_obs = self._vars_to_drop(init=init)
-> 1682         return func(hind.drop_vars(drop_init), verif.drop_vars(drop_obs), **kwargs)
   1683 
   1684     def _vars_to_drop(self, init: bool = True) -> Tuple[List[str], List[str]]:

~/checkouts/readthedocs.org/user_builds/climpred/checkouts/v2.2.0/climpred/classes.py in _verify(hind, verif, hist, reference, metric, comparison, alignment, dim, **metric_kwargs)
   2092                 )
   2093             forecast = forecast.rename({"init": "time"})
-> 2094             inits, verif_dates = return_inits_and_verif_dates(
   2095                 forecast,
   2096                 verif,

~/checkouts/readthedocs.org/user_builds/climpred/checkouts/v2.2.0/climpred/alignment.py in return_inits_and_verif_dates(forecast, verif, alignment, reference, hist)
     92         )
     93     elif alignment == "maximize":
---> 94         return _maximize_alignment(init_lead_matrix, all_verifs, leads)
     95     else:
     96         raise ValueError

~/checkouts/readthedocs.org/user_builds/climpred/checkouts/v2.2.0/climpred/alignment.py in _maximize_alignment(init_lead_matrix, all_verifs, leads)
    106     # Move row-wise and find all forecasted times that align with verification dates at
    107     # the given lead.
--> 108     verify_with_observations = init_lead_matrix.isin(all_verifs)
    109     lead_dependent_verif_dates = init_lead_matrix.where(verify_with_observations)
    110     # Probably a way to do this more efficiently since we're doing essentially

~/checkouts/readthedocs.org/user_builds/climpred/conda/v2.2.0/lib/python3.9/site-packages/xarray/core/common.py in isin(self, test_elements)
   1428             test_elements = test_elements.data
   1429 
-> 1430         return apply_ufunc(
   1431             duck_array_ops.isin,
   1432             self,

~/checkouts/readthedocs.org/user_builds/climpred/conda/v2.2.0/lib/python3.9/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
   1163     # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
   1164     elif any(isinstance(a, DataArray) for a in args):
-> 1165         return apply_dataarray_vfunc(
   1166             variables_vfunc,
   1167             *args,

~/checkouts/readthedocs.org/user_builds/climpred/conda/v2.2.0/lib/python3.9/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
    288 
    289     data_vars = [getattr(a, "variable", a) for a in args]
--> 290     result_var = func(*data_vars)
    291 
    292     if signature.num_outputs > 1:

~/checkouts/readthedocs.org/user_builds/climpred/conda/v2.2.0/lib/python3.9/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
    731             )
    732 
--> 733     result_data = func(*input_data)
    734 
    735     if signature.num_outputs == 1:

<__array_function__ internals> in isin(*args, **kwargs)

~/checkouts/readthedocs.org/user_builds/climpred/conda/v2.2.0/lib/python3.9/site-packages/numpy/lib/arraysetops.py in isin(element, test_elements, assume_unique, invert)
    708     """
    709     element = np.asarray(element)
--> 710     return in1d(element, test_elements, assume_unique=assume_unique,
    711                 invert=invert).reshape(element.shape)
    712 

<__array_function__ internals> in in1d(*args, **kwargs)

~/checkouts/readthedocs.org/user_builds/climpred/conda/v2.2.0/lib/python3.9/site-packages/numpy/lib/arraysetops.py in in1d(ar1, ar2, assume_unique, invert)
    581             mask = np.zeros(len(ar1), dtype=bool)
    582             for a in ar2:
--> 583                 mask |= (ar1 == a)
    584         return mask
    585 

KeyboardInterrupt: 

Plot the skill as a function of lead time

skill["rmm1"].plot()
plt.title("GMAO-GEOS_V2p1 RMM1 Skill")
plt.xlabel("Lead Time (Days)")
plt.ylabel("ACC")
Text(0, 0.5, 'ACC')
../../_images/daily-subx-example_25_1.png

References

1

J. Gottschalck, M. Wheeler, K. Weickmann, F. Vitart, N. Savage, H. Lin, H. Hendon, D. Waliser, K. Sperber, M. Nakagawa, C. Prestrelo, M. Flatau, and W. Higgins. A Framework for Assessing Operational Madden–Julian Oscillation Forecasts: A CLIVAR MJO Working Group Project. Bulletin of the American Meteorological Society, 91(9):1247–1258, September 2010. doi:10/fcftn7.

2

Kathy Pegion, Ben P. Kirtman, Emily Becker, Dan C. Collins, Emerson LaJoie, Robert Burgman, Ray Bell, Timothy DelSole, Dughong Min, Yuejian Zhu, Wei Li, Eric Sinsky, Hong Guan, Jon Gottschalck, E. Joseph Metzger, Neil P Barton, Deepthi Achuthavarier, Jelena Marshak, Randal D. Koster, Hai Lin, Normand Gagnon, Michael Bell, Michael K. Tippett, Andrew W. Robertson, Shan Sun, Stanley G. Benjamin, Benjamin W. Green, Rainer Bleck, and Hyemi Kim. The Subseasonal Experiment (SubX): A Multimodel Subseasonal Prediction Experiment. Bulletin of the American Meteorological Society, 100(10):2043–2060, July 2019. doi:10/ggkt9s.

3

Harun A. Rashid, Harry H. Hendon, Matthew C. Wheeler, and Oscar Alves. Prediction of the Madden–Julian oscillation with the POAMA dynamical prediction system. Climate Dynamics, 36(3):649–661, February 2011. doi:10/b992k4.

4

Matthew C. Wheeler and Harry H. Hendon. An All-Season Real-Time Multivariate MJO Index: Development of an Index for Monitoring and Prediction. Monthly Weather Review, 132(8):1917–1932, August 2004. doi:10/d489vd.