Calculate skill of a MJO Index of SubX model GEOS_V2p1 as function of weekly lead time¶
In this example, we demonstrate:
How to remotely access data from the Subseasonal Experiment (SubX) hindcast database and set it up to be used in
climpred
.How to calculate the Anomaly Correlation Coefficient (ACC) using weekly data with
climpred
How to calculate and plot historical forecast skill of the real-time multivariate MJO (RMM) indices as function of lead time.
The Subseasonal Experiment (SubX)¶
Further information on SubX is available from Pegion et al. 2019 and the SubX project website
The SubX public database is hosted on the International Research Institute for Climate and Society (IRI) data server http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/
Since the SubX data server is accessed via this notebook, the time for the notebook to run may is several minutes and will vary depending on the speed that data can be downloaded. This is a large dataset, so please be patient. If you prefer to download SubX data locally, scripts are available from https://github.com/kpegion/SubX.
Definitions¶
- RMM
Two indices (RMM1 and RMM2) are used to represent the MJO. Together they define the MJO based on 8 phases and can represent both the phase and amplitude of the MJO (Wheeler and Hendon 2004). This example uses the observed RMM1 provided by Matthew Wheeler at the Center for Australian Weather and Climate Research. It is the version of the indices in which interannual variability has not been removed.
- Skill of RMM
Traditionally, the skill of the RMM is calculated as a bivariate correlation encompassing the skill of the two indices together (Rashid et al. 2010; Gottschalck et al 2010). Currently,
climpred
does not have the functionality to calculate the bivariate correlation, thus the anomaly correlation coefficient for RMM1 index is calculated here as a demonstration. The bivariate correlation metric will be added in a future version ofclimpred
[1]:
import warnings
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.style.use('seaborn-talk')
import xarray as xr
import pandas as pd
import numpy as np
from climpred import HindcastEnsemble
import climpred
[2]:
warnings.filterwarnings("ignore")
Read the observed RMM Indices
[3]:
obsds = climpred.tutorial.load_dataset('RMM-INTERANN-OBS')['rmm1'].to_dataset()
obsds
[3]:
<xarray.Dataset> Dimensions: (time: 15613) Coordinates: * time (time) datetime64[ns] 1974-06-03 1974-06-04 ... 2017-07-24 Data variables: rmm1 (time) float64 1.867 1.868 1.924 1.574 ... 0.8918 0.9023 1.028
- time: 15613
- time(time)datetime64[ns]1974-06-03 ... 2017-07-24
array(['1974-06-03T00:00:00.000000000', '1974-06-04T00:00:00.000000000', '1974-06-05T00:00:00.000000000', ..., '2017-07-22T00:00:00.000000000', '2017-07-23T00:00:00.000000000', '2017-07-24T00:00:00.000000000'], dtype='datetime64[ns]')
- rmm1(time)float64...
array([1.86732 , 1.86838 , 1.92386 , ..., 0.891819, 0.902347, 1.027518])
Read the SubX RMM1 data for the GMAO-GEOS_V2p1 model from the SubX data server. It is important to note that the SubX data contains weekly initialized forecasts where the init
day varies by model. SubX data may have all NaNs for initial dates in which a model does not make a forecast, thus we apply dropna
over the S=init
dimension when how=all
data for a given S=init
is missing. This can be slow, but allows the rest of the calculations to go more quickly.
Note that we ran the dropna
operation offline and then uploaded the post-processed SubX dataset to the climpred-data
repo for the purposes of this demo. This is how you can do this manually:
url = 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.GMAO/.GEOS_V2p1/.hindcast/.RMM/.RMM1/dods/'
fcstds = xr.open_dataset(url, decode_times=False, chunks={'S': 1, 'L': 45}).dropna(dim='S',how='all')
[4]:
fcstds = climpred.tutorial.load_dataset('GMAO-GEOS-RMM1', decode_times=False)
fcstds
[4]:
<xarray.Dataset> Dimensions: (L: 45, M: 4, S: 510) Coordinates: * S (S) float32 14245.0 14250.0 14255.0 ... 20439.0 20444.0 20449.0 * M (M) float32 1.0 2.0 3.0 4.0 * L (L) float32 0.5 1.5 2.5 3.5 4.5 5.5 ... 40.5 41.5 42.5 43.5 44.5 Data variables: RMM1 (S, M, L) float32 -0.032366 0.10512561 ... 0.08325761 0.13890672 Attributes: Conventions: IRIDL
- L: 45
- M: 4
- S: 510
- S(S)float3214245.0 14250.0 ... 20444.0 20449.0
- long_name :
- Start Time
- calendar :
- standard
- standard_name :
- forecast_reference_time
- pointwidth :
- 0.0
- gridtype :
- 0
- units :
- days since 1960-01-01
array([14245., 14250., 14255., ..., 20439., 20444., 20449.], dtype=float32)
- M(M)float321.0 2.0 3.0 4.0
- long_name :
- Ensemble Member
- standard_name :
- realization
- pointwidth :
- 1.0
- gridtype :
- 0
- units :
- ids
array([1., 2., 3., 4.], dtype=float32)
- L(L)float320.5 1.5 2.5 3.5 ... 42.5 43.5 44.5
- long_name :
- Lead
- standard_name :
- forecast_period
- pointwidth :
- 1.0
- gridtype :
- 0
- units :
- days
array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5, 41.5, 42.5, 43.5, 44.5], dtype=float32)
- RMM1(S, M, L)float32...
- pointwidth :
- 0.0
- long_name :
- RMM1
- units :
- unitless
array([[[-0.032366, 0.105126, ..., 1.778099, 1.803143], [ 0.006499, 0.158598, ..., 0.826527, 0.743905], [ 0.014169, 0.147751, ..., 1.170505, 1.230083], [-0.014136, 0.142899, ..., 1.239202, 1.26709 ]], [[ 0.222802, 0.70556 , ..., 2.06537 , 1.935836], [ 0.192239, 0.610177, ..., 1.658914, 1.51989 ], [ 0.200765, 0.662475, ..., 1.159152, 1.093163], [ 0.203409, 0.631355, ..., -0.66206 , -0.785497]], ..., [[ 0.167977, 0.093171, ..., 0.455726, 0.463756], [ 0.300361, 0.193978, ..., -0.853684, -0.762082], [ 0.223478, 0.095381, ..., -0.784702, -0.75755 ], [ 0.228956, 0.144475, ..., -0.866805, -0.977645]], [[-0.648854, -1.236488, ..., 0.467468, 0.470097], [-0.566065, -1.12568 , ..., 0.261335, 0.253228], [-0.638935, -1.209603, ..., -0.003442, -0.231822], [-0.603829, -1.186343, ..., 0.083258, 0.138907]]], dtype=float32)
- Conventions :
- IRIDL
The SubX data dimensions correspond to the following climpred
dimension definitions: X=lon
,L=lead
,Y=lat
,M=member
, S=init
. We will rename the dimensions to their climpred
names.
[5]:
fcstds=fcstds.rename({'S': 'init','L': 'lead','M': 'member', 'RMM1' : 'rmm1'})
Let’s make sure that the lead
dimension is set properly for climpred
. SubX data stores leads
as 0.5, 1.5, 2.5, etc, which correspond to 0, 1, 2, … days since initialization. We will change the lead
to be integers starting with zero.
[6]:
fcstds['lead'] = (fcstds['lead'] - 0.5).astype('int')
Now we need to make sure that the init
dimension is set properly for climpred
. We use an xarray
convenience function to decode it into the proper CFTime calendar. It can detect that this is on a 360 day calendar.
[7]:
fcstds = xr.decode_cf(fcstds, decode_times=True)
[8]:
fcstds
[8]:
<xarray.Dataset> Dimensions: (init: 510, lead: 45, member: 4) Coordinates: * init (init) datetime64[ns] 1999-01-01 1999-01-06 ... 2015-12-27 * member (member) float32 1.0 2.0 3.0 4.0 * lead (lead) int64 0 1 2 3 4 5 6 7 8 9 ... 35 36 37 38 39 40 41 42 43 44 Data variables: rmm1 (init, member, lead) float32 ... Attributes: Conventions: IRIDL
- init: 510
- lead: 45
- member: 4
- init(init)datetime64[ns]1999-01-01 ... 2015-12-27
- long_name :
- Start Time
- standard_name :
- forecast_reference_time
- pointwidth :
- 0.0
- gridtype :
- 0
array(['1999-01-01T00:00:00.000000000', '1999-01-06T00:00:00.000000000', '1999-01-11T00:00:00.000000000', ..., '2015-12-17T00:00:00.000000000', '2015-12-22T00:00:00.000000000', '2015-12-27T00:00:00.000000000'], dtype='datetime64[ns]')
- member(member)float321.0 2.0 3.0 4.0
- long_name :
- Ensemble Member
- standard_name :
- realization
- pointwidth :
- 1.0
- gridtype :
- 0
- units :
- ids
array([1., 2., 3., 4.], dtype=float32)
- lead(lead)int640 1 2 3 4 5 6 ... 39 40 41 42 43 44
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])
- rmm1(init, member, lead)float32...
- pointwidth :
- 0.0
- long_name :
- RMM1
- units :
- unitless
array([[[-0.032366, 0.105126, ..., 1.778099, 1.803143], [ 0.006499, 0.158598, ..., 0.826527, 0.743905], [ 0.014169, 0.147751, ..., 1.170505, 1.230083], [-0.014136, 0.142899, ..., 1.239202, 1.26709 ]], [[ 0.222802, 0.70556 , ..., 2.06537 , 1.935836], [ 0.192239, 0.610177, ..., 1.658914, 1.51989 ], [ 0.200765, 0.662475, ..., 1.159152, 1.093163], [ 0.203409, 0.631355, ..., -0.66206 , -0.785497]], ..., [[ 0.167977, 0.093171, ..., 0.455726, 0.463756], [ 0.300361, 0.193978, ..., -0.853684, -0.762082], [ 0.223478, 0.095381, ..., -0.784702, -0.75755 ], [ 0.228956, 0.144475, ..., -0.866805, -0.977645]], [[-0.648854, -1.236488, ..., 0.467468, 0.470097], [-0.566065, -1.12568 , ..., 0.261335, 0.253228], [-0.638935, -1.209603, ..., -0.003442, -0.231822], [-0.603829, -1.186343, ..., 0.083258, 0.138907]]], dtype=float32)
- Conventions :
- IRIDL
Make Weekly Averages
[9]:
fcstweekly = fcstds.rolling(lead=7, center=False).mean().dropna(dim='lead')
obsweekly = obsds.rolling(time=7, center=False).mean().dropna(dim='time')
print(fcstweekly)
print(obsweekly)
<xarray.Dataset>
Dimensions: (init: 510, lead: 39, member: 4)
Coordinates:
* init (init) datetime64[ns] 1999-01-01 1999-01-06 ... 2015-12-27
* member (member) float32 1.0 2.0 3.0 4.0
* lead (lead) int64 6 7 8 9 10 11 12 13 14 ... 36 37 38 39 40 41 42 43 44
Data variables:
rmm1 (init, member, lead) float32 0.13466184 0.10946906 ... 0.076061934
<xarray.Dataset>
Dimensions: (time: 15456)
Coordinates:
* time (time) datetime64[ns] 1974-06-09 1974-06-10 ... 2017-07-24
Data variables:
rmm1 (time) float64 1.336 1.107 0.9046 0.695 ... 0.6265 0.673 0.7352
Create a new xr.DataArray
for the weekly fcst data
[10]:
nleads = fcstweekly['lead'][::7].size
fcstweeklyda = xr.DataArray(fcstweekly['rmm1'][:,:,::7],
coords={'init' : fcstweekly['init'],
'member': fcstweekly['member'],
'lead': np.arange(1,nleads+1),
},
dims=['init', 'member','lead'])
fcstweeklyda = fcstweeklyda.rename('rmm1')
climpred
requires that lead
dimension has an attribute called units
indicating what time units the lead
is assocated with. Options are: years,seasons,months,weeks,pentads,days
. The lead
units
are weeks
.
[11]:
fcstweeklyda['lead'].attrs = {'units': 'weeks'}
Create the climpred HindcastEnsemble
object and add the observations.
[12]:
hindcast = HindcastEnsemble(fcstweeklyda)
hindcast = hindcast.add_observations(obsweekly)
Calculate the Anomaly Correlation Coefficient (ACC)
[13]:
skill = hindcast.verify(metric='acc', comparison='e2o', alignment='maximize', dim='init')
Plot the skill as a function of lead time
[14]:
x = np.arange(fcstweeklyda['lead'].size)
plt.bar(x, skill['rmm1'])
plt.title('GMAO_GOES_V2p1 RMM1 Skill')
plt.xlabel('Lead Time (Weeks)')
plt.ylabel('ACC')
plt.ylim(0.0, 1.0)
[14]:
(0.0, 1.0)
References¶
Pegion, K., B.P. Kirtman, E. Becker, D.C. Collins, E. LaJoie, R. Burgman, R. Bell, T. DelSole, D. Min, Y. Zhu, W. Li, E. Sinsky, H. Guan, J. Gottschalck, E.J. Metzger, N.P. Barton, D. Achuthavarier, J. Marshak, R.D. Koster, H. Lin, N. Gagnon, M. Bell, M.K. Tippett, A.W. Robertson, S. Sun, S.G. Benjamin, B.W. Green, R. Bleck, and H. Kim, 2019: The Subseasonal Experiment (SubX): A Multimodel Subseasonal Prediction Experiment. Bull. Amer. Meteor. Soc., 100, 2043–2060, https://doi.org/10.1175/BAMS-D-18-0270.1
Kirtman, B. P., Pegion, K., DelSole, T., Tippett, M., Robertson, A. W., Bell, M., … Green, B. W. (2017). The Subseasonal Experiment (SubX) [Data set]. IRI Data Library. https://doi.org/10.7916/D8PG249H
Wheeler, M. C., & Hendon, H. H. (2004). An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction. Monthly Weather Review, 132(8), 1917–1932. http://doi.org/10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2
Rashid, H. A., Hendon, H. H., Wheeler, M. C., & Alves, O. (2010). Prediction of the Madden–Julian oscillation with the POAMA dynamical prediction system. Climate Dynamics, 36(3-4), 649–661. http://doi.org/10.1007/s00382-010-0754-x
Gottschalck, J., Wheeler, M., Weickmann, K., Vitart, F., Savage, N., Lin, H., et al. (2010). A Framework for Assessing Operational Madden–Julian Oscillation Forecasts: A CLIVAR MJO Working Group Project. Bulletin of the American Meteorological Society, 91(9), 1247–1258. http://doi.org/10.1175/2010BAMS2816.1