{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setting up your own output\n",
"\n",
"This demo demonstrates how you can setup your raw model output with ``climpred.preprocessing`` to match `climpred`'s expectations."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2019-04-29T20:09:41.074645Z",
"start_time": "2019-04-29T20:09:40.250487Z"
}
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import xarray as xr\n",
"\n",
"import climpred"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"from climpred.preprocessing.shared import load_hindcast, set_integer_time_axis\n",
"from climpred.preprocessing.mpi import get_path"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assuming your raw model output is stored in multiple files per member and initialization, {py:func}`.climpred.preprocessing.shared.load_hindcast` is a nice wrapper function based on {py:func}`.climpred.preprocessing.mpi.get_path` designed for the output format of `MPI-ESM` to aggregated all hindcast output into one file as expected by `climpred`.\n",
"\n",
"The basic idea is to look over the output of all members and concatinate, then loop over all initializations and concatinate. Before concatination, it is important to make the `time` dimension identical in all input datasets for concatination.\n",
"\n",
"To reduce the data size, use the `preprocess` function provided to `xr.open_mfdataset` wisely in combination with {py:func}`.climpred.preprocessing.shared.set_integer_axis`, e.g. additionally extracting only a certain region, time-step, time-aggregation or only few variables for a multi-variable input file as in `MPI-ESM` standard output."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"v = \"global_primary_production\"\n",
"\n",
"def preprocess_1var(ds, v=v):\n",
" \"\"\"Only leave one variable `v` in dataset \"\"\"\n",
" return ds[v].to_dataset(name=v).squeeze()"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Processing init 1961 ...\n",
"Processing init 1962 ...\n",
"Processing init 1963 ...\n",
"Processing init 1964 ...\n",
"CPU times: user 5.07 s, sys: 2.06 s, total: 7.13 s\n",
"Wall time: 5.19 s\n"
]
}
],
"source": [
"# lead_offset because yearmean output\n",
"%time ds = load_hindcast(inits=range(1961, 1965), members=range(1, 3), preprocess=preprocess_1var, get_path=get_path)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Coordinates:\n",
" depth float64 0.0\n",
" lat float64 0.0\n",
" lon float64 0.0\n",
" * lead (lead) int64 1 2 3 4 5 6 7 8 9 10\n",
" * member (member) int64 1 2\n",
" * init (init) int64 1961 1962 1963 1964"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# what we need for climpred\n",
"ds.coords"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"
\n",
"
\n",
"
\n",
" \n",
"
Array
Chunk
\n",
" \n",
" \n",
"
Bytes
320 B
4 B
\n",
"
Shape
(4, 2, 10)
(1, 1, 1)
\n",
"
Count
720 Tasks
80 Chunks
\n",
"
Type
float32
numpy.ndarray
\n",
" \n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
"
\n",
"
"
],
"text/plain": [
"dask.array"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ds[v].data"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 216 ms, sys: 44 ms, total: 260 ms\n",
"Wall time: 220 ms\n"
]
}
],
"source": [
"# loading the data into memory\n",
"# if not rechunk\n",
"%time ds = ds.load()"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"# go on with creation of PredictionEnsemble\n",
"# climpred.HindcastEnsemble(ds).add_observations(obs).verify(metric='acc', comparison='e2o', dim='init', alignment='maximize')\n",
"# climpred.PerfectModelEnsemble(ds).add_control(control).verify(metric='acc', comparison='m2e', dim=['init','member'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## `intake-esm` for cmorized output"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In case you have access to cmorized output of CMIP experiments, consider using `intake-esm `_. With {py:func}`.climpred.preprocessing.shared.set_integer_time_axis` you can align the `time` dimension of all input files. Finally, {py:func}`.climpred.preprocessing.shared.rename_to_climpred_dims` only renames."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"from climpred.preprocessing.shared import rename_to_climpred_dims, set_integer_time_axis"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# make to have to install intake-esm installed, which is not included in climpred-dev\n",
"import intake # this is enough for intake-esm to work"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"col_url = \"/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json\"\n",
"col_url = \"https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json\"\n",
"col = intake.open_esm_datastore(col_url)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Index(['activity_id', 'institution_id', 'source_id', 'experiment_id',\n",
" 'member_id', 'table_id', 'variable_id', 'grid_label', 'dcpp_init_year',\n",
" 'version', 'time_range', 'path'],\n",
" dtype='object')"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"col.df.columns"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"# load 2 members for 2 inits for one variable from one model\n",
"query = dict(experiment_id=[\n",
" 'dcppA-hindcast'], table_id='Amon', member_id=['r1i1p1f1', 'r2i1p1f1'], dcpp_init_year=[1970, 1971],\n",
" variable_id='tas', source_id='MPI-ESM1-2-HR')\n",
"cat = col.search(**query)\n",
"cdf_kwargs = {'chunks': {'time': 12}, 'decode_times': False}"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"