{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Setting up your own output\n", "\n", "This demo demonstrates how you can setup your raw model output with ``climpred.preprocessing`` to match `climpred`'s expectations." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-04-29T20:09:41.074645Z", "start_time": "2019-04-29T20:09:40.250487Z" } }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import xarray as xr\n", "\n", "import climpred" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from climpred.preprocessing.shared import load_hindcast, set_integer_time_axis\n", "from climpred.preprocessing.mpi import get_path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assuming your raw model output is stored in multiple files per member and initialization, {py:func}`.climpred.preprocessing.shared.load_hindcast` is a nice wrapper function based on {py:func}`.climpred.preprocessing.mpi.get_path` designed for the output format of `MPI-ESM` to aggregated all hindcast output into one file as expected by `climpred`.\n", "\n", "The basic idea is to look over the output of all members and concatinate, then loop over all initializations and concatinate. Before concatination, it is important to make the `time` dimension identical in all input datasets for concatination.\n", "\n", "To reduce the data size, use the `preprocess` function provided to `xr.open_mfdataset` wisely in combination with {py:func}`.climpred.preprocessing.shared.set_integer_axis`, e.g. additionally extracting only a certain region, time-step, time-aggregation or only few variables for a multi-variable input file as in `MPI-ESM` standard output." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "v = \"global_primary_production\"\n", "\n", "def preprocess_1var(ds, v=v):\n", " \"\"\"Only leave one variable `v` in dataset \"\"\"\n", " return ds[v].to_dataset(name=v).squeeze()" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Processing init 1961 ...\n", "Processing init 1962 ...\n", "Processing init 1963 ...\n", "Processing init 1964 ...\n", "CPU times: user 5.07 s, sys: 2.06 s, total: 7.13 s\n", "Wall time: 5.19 s\n" ] } ], "source": [ "# lead_offset because yearmean output\n", "%time ds = load_hindcast(inits=range(1961, 1965), members=range(1, 3), preprocess=preprocess_1var, get_path=get_path)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Coordinates:\n", " depth float64 0.0\n", " lat float64 0.0\n", " lon float64 0.0\n", " * lead (lead) int64 1 2 3 4 5 6 7 8 9 10\n", " * member (member) int64 1 2\n", " * init (init) int64 1961 1962 1963 1964" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# what we need for climpred\n", "ds.coords" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 320 B 4 B
Shape (4, 2, 10) (1, 1, 1)
Count 720 Tasks 80 Chunks
Type float32 numpy.ndarray
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 10\n", " 2\n", " 4\n", "\n", "
" ], "text/plain": [ "dask.array" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds[v].data" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 216 ms, sys: 44 ms, total: 260 ms\n", "Wall time: 220 ms\n" ] } ], "source": [ "# loading the data into memory\n", "# if not rechunk\n", "%time ds = ds.load()" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# go on with creation of PredictionEnsemble\n", "# climpred.HindcastEnsemble(ds).add_observations(obs).verify(metric='acc', comparison='e2o', dim='init', alignment='maximize')\n", "# climpred.PerfectModelEnsemble(ds).add_control(control).verify(metric='acc', comparison='m2e', dim=['init','member'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## `intake-esm` for cmorized output" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In case you have access to cmorized output of CMIP experiments, consider using `intake-esm `_. With {py:func}`.climpred.preprocessing.shared.set_integer_time_axis` you can align the `time` dimension of all input files. Finally, {py:func}`.climpred.preprocessing.shared.rename_to_climpred_dims` only renames." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from climpred.preprocessing.shared import rename_to_climpred_dims, set_integer_time_axis" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# make to have to install intake-esm installed, which is not included in climpred-dev\n", "import intake # this is enough for intake-esm to work" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "col_url = \"/home/mpim/m300524/intake-esm-datastore/catalogs/mistral-cmip6.json\"\n", "col_url = \"https://raw.githubusercontent.com/NCAR/intake-esm-datastore/master/catalogs/pangeo-cmip6.json\"\n", "col = intake.open_esm_datastore(col_url)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['activity_id', 'institution_id', 'source_id', 'experiment_id',\n", " 'member_id', 'table_id', 'variable_id', 'grid_label', 'dcpp_init_year',\n", " 'version', 'time_range', 'path'],\n", " dtype='object')" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "col.df.columns" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "# load 2 members for 2 inits for one variable from one model\n", "query = dict(experiment_id=[\n", " 'dcppA-hindcast'], table_id='Amon', member_id=['r1i1p1f1', 'r2i1p1f1'], dcpp_init_year=[1970, 1971],\n", " variable_id='tas', source_id='MPI-ESM1-2-HR')\n", "cat = col.search(**query)\n", "cdf_kwargs = {'chunks': {'time': 12}, 'decode_times': False}" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
activity_idinstitution_idsource_idexperiment_idmember_idtable_idvariable_idgrid_labeldcpp_init_yearversiontime_rangepath
0DCPPMPI-MMPI-ESM1-2-HRdcppA-hindcastr1i1p1f1Amontasgn1971.0v20190906197111-198112/work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E...
1DCPPMPI-MMPI-ESM1-2-HRdcppA-hindcastr1i1p1f1Amontasgn1970.0v20190906197011-198012/work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E...
2DCPPMPI-MMPI-ESM1-2-HRdcppA-hindcastr2i1p1f1Amontasgn1971.0v20190906197111-198112/work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E...
3DCPPMPI-MMPI-ESM1-2-HRdcppA-hindcastr2i1p1f1Amontasgn1970.0v20190906197011-198012/work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E...
\n", "
" ], "text/plain": [ " activity_id institution_id source_id experiment_id member_id \\\n", "0 DCPP MPI-M MPI-ESM1-2-HR dcppA-hindcast r1i1p1f1 \n", "1 DCPP MPI-M MPI-ESM1-2-HR dcppA-hindcast r1i1p1f1 \n", "2 DCPP MPI-M MPI-ESM1-2-HR dcppA-hindcast r2i1p1f1 \n", "3 DCPP MPI-M MPI-ESM1-2-HR dcppA-hindcast r2i1p1f1 \n", "\n", " table_id variable_id grid_label dcpp_init_year version time_range \\\n", "0 Amon tas gn 1971.0 v20190906 197111-198112 \n", "1 Amon tas gn 1970.0 v20190906 197011-198012 \n", "2 Amon tas gn 1971.0 v20190906 197111-198112 \n", "3 Amon tas gn 1970.0 v20190906 197011-198012 \n", "\n", " path \n", "0 /work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E... \n", "1 /work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E... \n", "2 /work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E... \n", "3 /work/ik1017/CMIP6/data/CMIP6/DCPP/MPI-M/MPI-E... " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cat.df.head()" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "def preprocess(ds):\n", " # extract tiny spatial and temporal subset to make this fast\n", " ds = ds.isel(lon=[50, 51, 52], lat=[50, 51, 52],\n", " time=np.arange(12 * 2))\n", " # make time dim identical\n", " ds = set_integer_time_axis(ds,time_dim='time')\n", " return ds" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Progress: |███████████████████████████████████████████████████████████████████████████████| 100.0% \n", "\n", "--> The keys in the returned dictionary of datasets are constructed as follows:\n", "\t'activity_id.institution_id.source_id.experiment_id.table_id.grid_label'\n", " \n", "--> There are 1 group(s)\n" ] } ], "source": [ "dset_dict = cat.to_dataset_dict(\n", " cdf_kwargs=cdf_kwargs, preprocess=preprocess)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Coordinates:\n", " height float64 ...\n", " * dcpp_init_year (dcpp_init_year) float64 1.97e+03 1.971e+03\n", " * lat (lat) float64 -42.55 -41.61 -40.68\n", " * time (time) int64 1 2 3 4 5 6 7 8 9 ... 17 18 19 20 21 22 23 24\n", " * lon (lon) float64 46.88 47.81 48.75\n", " * member_id (member_id) \n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Array Chunk
Bytes 3.46 kB 432 B
Shape (2, 2, 24, 3, 3) (1, 1, 12, 3, 3)
Count 176 Tasks 8 Chunks
Type float32 numpy.ndarray
\n", "\n", "\n", "\n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 2\n", " 2\n", "\n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", "\n", " \n", " \n", "\n", " \n", " 3\n", " 3\n", " 24\n", "\n", "\n", "\n", "" ], "text/plain": [ "dask.array" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds['tas'].data" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 218 ms, sys: 74 ms, total: 292 ms\n", "Wall time: 237 ms\n" ] } ], "source": [ "# loading the data into memory\n", "# if not rechunk\n", "# this is here quite fast before we only select 9 grid cells\n", "%time ds = ds.load()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# go on with creation of PredictionEnsemble\n", "# climred.HindcastEnsemble(ds).add_observations(obs).verify(metric='acc', comparison='e2o', dim='init', alignment='maximize')\n", "# climred.PerfectModelEnsemble(ds).add_control(control).verify(metric='acc', comparison='m2e', dim=['init','member'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.10" }, "latex_envs": { "LaTeX_envs_menu_present": false, "autoclose": true, "autocomplete": true, "bibliofile": "large.bib", "cite_by": "apalike", "current_citInitial": 1, "eqLabelWithNumbers": false, "eqNumInitial": 1, "hotkeys": { "equation": "Ctrl-E", "itemize": "Ctrl-I" }, "labels_anchors": false, "latex_user_defs": false, "report_style_numbering": false, "user_envs_cfg": false }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }