During the course of the CCRN project, the CCRN project team downloaded, processed and created several datasets for running MESH. This page documents the locally available datasets and scripts developed to extract the data for a sub-basin, provided it is within the domain of the datasets.This is a top-level document; each dataset is further described in README files and papers in their storage locations mentioned.
- To run MESH, 7 variables are needed at sub-daily time step (see Meteorological Input).
- Any bugs or questions can be directed to Elshamy, Mohamed.
- The scripts provided process one variable at a time.
- One can easily parallelize by invoking several instances of MATLAB and changing the variable being processed.
- Another way would be to build a loop into those scripts to process all variables and use the MATLAB parallel pool.
- The scripts print the input and output file names and the time stamps ... please take care that the time progresses in the manner expected, depending on the time step of the dataset (hourly, 3hourly, etc.) and time span.
This presentation sheds some light on the datasets: Bias Corrected Climate Forcing Datasets for Land Surface Modeling.pdf
The datasets can be accessed by one of the following methods:
- If you have access to Graham, then you can download the data to use the MATLAB scripts (they were written and tested on Windows machine) or if you have a MATLAB license on Graham, then you can transfer the scripts there and adjust them (normally all what's needed is switch the path slashes from '\' to '/') but it is your responsibility to test them.
- If you are at GIWS and have access to GLOBALWATER share on datastore, you can access the files there.
- If you do not have access to either location, you need to download the data from their original sources if they are published already. Some links are provided below for the publicly-available Federated Research Data Repository (FRDR) and Global Water Futures Cuizinart database.
- Otherwise, contact Elshamy, Mohamedor the data custodian to see if you can get access.
Some of forcing datasets, such as CanRCM4-WFDEI-GEM-CaPA, WFDEI-GEM-CaPA, etc. can be acquired directly via Global Water Futures (GWF) data Cuizinart. The Cuizinart is a cloud-based platform that provides an interactive portal for researchers to "slice and dice" large NetCDF datasets across the GWF program and beyond. To access datasets, users should create an account on Cuizinart then obtain a Globus ID and link it with the Cuizinart account.
Page Contents
Historical Datasets
WFD: Watch Forcing Data
Resolution and Domain: Global surface data; 0.5° spatial resolution
Timestep: 3 or 6 hourly
Period: 1901-2001
Source: ftp://rfdata:forceDATA@ftp.iiasa.ac.at/WATCH_Forcing_Data/ (password: forceDATA)
Paper: journals.ametsoc.org/view/journals/hydr/12/5/2011jhm1369_1.xml
Storage:
Server: \\datastore\GLOBALWATER\giws_research_water_share\JeffersonW\0_Raw_Data\ClimateForcing\ClimateForcing_WATCH
Script: Interpolate_WFD_NC_Seq_SingleVAR.m
Instructions:
- You need to have the drainage database of the basin or at least the header and RANK field of it.
- Please read the comments at the top of the script and within carefully. This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\. It needs to be configured to write data as described in the header comments. It is recommended that data be written to a local HDD for speed.
- The script uses bilinear interpolation from a regular (LAT-LON) grid to another regular grid (LAT-LON).
- Processed data is saved in SEQ format.
- Snowfall and Rainfall are supplied as separate components that need to be summed for use into MESH.
Script: Interpolate_WFD_Rain_Snow_NC_Seq.m
Instructions:
- Use the above script: to interpolate the rain and snow components then write the sum to a single file for precipitation.
WFDEI: WATCH-Forcing-Data-ERA-Interim
Resolution and Domain: Global surface data, 0.5° spatial resolution
Timestep: 3 hourly
Period: 1979-2018
Source: ftp://rfdata:forceDATA@ftp.iiasa.ac.at/WATCH_Forcing_Data/WFDEI (password: forceDATA)
Paper: https://agupubs.onlinelibrary.wiley.com/doi/10.1002/2014WR015638
Storage - whole dataset:
Server: \\datastore\GLOBALWATER\giws_research_water_share\JeffersonW\0_Raw_Data\ClimateForcing\ClimateForcing_WFDEI
Storage - clipped dataset:
Various versions of this dataset clipped for the CCRN domain and a bigger North American domain exist on the server but until 2016 only.
Server: \\datastore\GLOBALWATER\Model_Output\81_WFDEI_1979_2016
Graham: /project/6008034/Model_Output/CCRN/WFDEI_1979_2016
Script: Read_WFDEI_SingleVAR.m
Instructions:
- You need to have the drainage database of the basin or at least the header and RANK field of it
- Please read the comments at the top of the script and within carefully.
- This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\. It needs to be configured to write data as described in the header comments. It is recommended that data be written to a local HDD for speed.
- The script assumes no interpolation is needed, i.e. basin grid has the same resolution as the data and is aligned to that of the data, i.e. it only clips the data for the basin, and formats it using the RANK field to write the space-saving SEQ format.
- This script processes only the North American version of the data that is already interpolated to 0.125-degree resolution, not the original data.
Princeton Dataset v2
Resolution and Domain: Global surface data, 0.5 deg spatial resolution
Timestep: 3 hourly
Period: 1901-2012
Source: http://hydrology.princeton.edu/data.pgf.php
Storage:
Server: \\datastore\GLOBALWATER\giws_research_water_share\JeffersonW\0_Raw_Data\ClimateForcing\ClimateForcing_Princeton
GEM-CaPA
- Combines hourly forecasts from the Global Environmental Multiscale (GEM) atmospheric model at 40 m with 6 hourly Canadian Precipitation Analysis (CaPA) but resampled to 1 hourly resolution.
Resolution and Domain: GEM has global coverage but CaPA is restricted to North America. Spatial resolution increased over time from 25 km to 15 km to 10 km.
Timestep: 1 or 3 hourly
Period: 2002 onwards
Source: ECCC
Storage - NetCDF, hourly, Sep 1, 2004 - Aug 31, 2017
Processed by Dan Princz. The dataset is originally 4D (lat, lon, level, time).
Server: \\datastore\GLOBALWATER\Model_Output\101_GEM-CaPA_interpolated_0.125
Graham: /project/6008034/Model_Output/101_GEM-CaPA_interpolated_0.125
The original forcing dataset has four dimensions. The post-processing analysis was applied by Mohamed Elshamy to remove the 'level' dimension (there is one level in the dataset). The post-processed GEM-CaPA forcing has three dimensions, including lat, lon, and time
Graham: /project/6008034/Model_Output/104_GEM-CaPA_interpolated_0.125_3D
Storage - NetCDF, 3 hourly, 2005-2016
Post-processed by Elvis Asong from the above hourly dataset (level is not included as a dimensional variable, Feb 29 on leap years removed)
Server: \\datastore\GLOBALWATER\Model_Output\103_GEM-CaPA-3h
Graham: /project/6008034/Model_Output/CCRN/GEM_CaPA_2005_2016
The 3-hourly version is the one used to produce WFDEI-GEM-CaPA and has been published as part of that - see the section below.
WFDEI-GEM-CaPA
- This dataset combines the strength of GEM-CaPA (better agreement with observations over Canada) with the length of WFDEI to construct a longer time series that resembles GEM-CaPA to be used for climate analysis and model calibration/validation purposes.
- Variables are assumed to be at 40 m because the reference dataset is at 40 m.
- Combining the two datasets was done by Elvis Asong
Resolution and Domain: This dataset is restricted to a large domain covering most of North America (intersection of the WFDEI and GEM-CaPA domains)
Timestep: 3 hourly
Period: 1979-2016
Source:
- This dataset has been published and is available at: http://dx.doi.org/10.20383/101.0111 (Note: GEM-CaPA 2005-2016 and WFDEI 1979-2016 were also part of the published dataset)
- A paper was published about it to ESSD and is available at: https://essd.copernicus.org/articles/12/629/2020/
Storage:
Server: \\datastore\GLOBALWATER\Model_Output\181_WFDEI-GEM_1979_2016
Graham: /project/6008034/Model_Output/CCRN/WFDEI-GEM_1979_2016
FRDR (public): https://www.frdr.ca/repo/handle/doi:10.20383/101.0111
Cuizinart (public): https://tuna.cs.uwaterloo.ca/ (select "wfdei-gem-capa" from drop-down list)
Script: Read_WFDEI_SingleVAR.m
Instructions:
- This is the same script used to process data for a sub-basin as mentioned above for the NA version of the WFDEI data.
WRF-CTL
- This dataset is a dynamical downscaling of ERA-Interim by the Weather Research and Forecasting (WRF) model
- By Yanping Li group - see https://hess.copernicus.org/articles/23/4635/2019/.
Resolution and Domain: 4 km covering the CCRN domain (Western Canada)
Timestep: hourly
Period: Oct 2000 - Sep 2015
Storage:
Server: \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\CTRL_Post_Processed
Graham: /project/6008034/Model_Output/WRF/WRF4KM_CA/CTRL_Post_Processed
- Surface variables important for hydrology are stored at: /project/6008034/Model_Output/WRF/WRF4KM_CA/HYDRO/CTRL
Instructions and Scripts:
To process WRF data for a particular basin, there are a few steps.
- The process is done partly on Graham using a set of shell and NCL scripts (written by Sopan Kurkute and edited by Zhenhua Li and Mohmed Elshamy) and then completed on Windows using a MATLAB Script.
- The scripts use the NCL lanague to interpolate the WRF data from its rotated-pole coordinate system to a regular LAT-LON grid using "conserve" interpolation which maintains the totals.
- This is especially important when interpolating from a fine grid (WRF is 4 km) to a coarser grid (usually 10 km for MESH).
- Processing will need to be done on Graham (or other servers that you may move the data to)
- Note that you will need to create a job script (SLURM) on Graham that calls the shell script and submit the job.
- Processing time depends on the size of the basin but is usually several hours. It parallelizes automatically so it would be good to ask for 8 cores - one per variable.
The scripts are included here and steps are as follows:
- Configure and run the script convert_to_r2c.wgt.sh which will process a single month to get the weighting matrix for your basin. Configuration means setting some information about the dimensions and location of your basin and the location of input and outputs. Change the output and source (where the scripts reside) folders only. This script will produce a file called "weights_file.nc" in the output and you need to copy/move it to the source folder.
- Configure and run the script convert_to_r2c.CTL.sh to your grid specifications. You can also configure which years to process and which months. The first year goes from month 10 - 12 and has to be processed separately. These scripts produce monthly R2C files. The script uses NCL scripts to process all variables together. Files are arranged per variable in annual folders (each year in a separate folder). If you wish to process some variables, comment blocks of code for unwanted ones.
- Collect all files from the annual folders into one folder per variable (keep the variable folder names) and transfer them a windows machine
- Run the MATLAB script: multi_r2c_2_seq_SingleVAR.m that will collate all R2C files for a variable into a single SEQ file for that variable. You can parallelize the operation by invoking several instances of MATLAB and processing a variable in each instance. The script is currently configured to process the whole period from Oct 2000 - Sep 2015. There is one unit conversion made within the script for precipitation as it is originally in mm/hr and it is required to be in mm/sec (which is the same as kg/m2/sec given a density of 1000 kg/m3 for water).
WRF-CTL-GEM-CaPA
- WRF_CTL bias corrected against GEM-CaPA using the overlap period (Sep 2004 - Sep 2015) using Alex Cannon's Multi-variate bias correction method (MBCn) with 200 iterations.
Resolution and Domain: This dataset covers the CCRN (Western Canada) domain at 0.125deg resolution and reflects variables at 40m height (as the reference dataset)
Timestep: hourly
Period: Oct 2000 - Sep 2015
Storage:
- Processed by Zhenhua Li during the summer of 2018
Server: \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\124_Corrected_to_GEM-CaPA_monthly_MBCn_it200
Graham: /project/6008034/Model_Output/WRF/WRF4KM_CA/124_WRF-CTL_Corrected_to_GEM-CaPA_monthly_MBCn_it200
Script: Read_WRF_Cor_GEM_CaPA_SingleVAR.m.
Instructions:
- You need to have the drainage database of the basin or at least the header and RANK field of it
- Please read the comments at the top of the script and within carefully.
- This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\ . You need to configure the output data folder and it is recommended that the processed data is written to a local HDD (not a network share) to improve speed and avoid cluttering the GLOBALWATER\Model_Output network share.
- The script assumes no interpolation is needed, i.e. basin grid has the same resolution as the data and is aligned to that of the data, i.e. it only clips the data for the basin, and formats it using the RANK field to write the space-saving SEQ format.
Script: Interpolate_WRF_CTL_Cor_GEM_CaPA_SingleVAR.m
Instructions:
- Use this script if you need to interpolate the data (i.e if the grid has a different resolution or is not aligned with the current grid)
- The script has a very similar structure to the clipping routine and will interpolate the data using bilinear interpolation.
Datasets covering both Historical and Future Periods
CanRCM4
- These model runs are dynamic downscaling of the CanESM2 earth system model
- A total of 21 variables were downloaded, including surface and lowest model level (above the surface) variables and water and energy balance components
- Forced by historical emissions till 2005 and by RCP 8.5 from 2006-2100
Resolution and Domain: This dataset covers the CORDEX North American Domain, and there is one high-resolution member at 0.22° resolution and 15 members at 0.44° resolution
Timestep: hourly
Period: 1950-2100
Storage:
Server:
- \\datastore\GLOBALWATER\Model_Output\110_CanRCM4-44-1h1
- \\datastore\GLOBALWATER\Model_Output\110_CanRCM4-22-1h2
Graham:
- /project/6008034/Model_Output/110_CanRCM4-44-1h1
- /project/6008034/Model_Output/110_CanRCM4-22-1h2
Script: CanRCM4_NC_44_Seq_fullperiod.m
1) This folder contains CanRCM4 forcing files - fifteen ensemble members arranged by variable. The dataset has a spatial resolution of 0.44°(~50km) at hourly temporal resolution.
2) This folder contains CanRCM4 forcing files - one member arranged by variable. The dataset has a spatial resolution of 0.22° (~ 25km) at hourly temporal resolution.
Instructions:
- Use this MATLAB script to process the data for a sub-basin
- You need to have the drainage database of the basin or at least the header and RANK field of it
- Please read the comments at the top of the script and within carefully.
- This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\ . You need to configure the output data folder and it is recommended that the processed data is written to a local HDD (not a network share). It improves speed and avoids cluttering the GLOBALWATER\Model_Output network share.
- This script interpolates the data from the CanRCM4 rotated pole grid to a regular LAT-LON grid as defined by the shed file using linear (but sparsely spaced) interpolation.
- Processed data is saved in SEQ format.
- The above script can be used for all variables.
- Windspeed at the surface is provided as sfcWind, but the lowest model level, meridional and zonal components need to be combined to get the resultant windspeed.
A processed version of this dataset for MESH users and in preparation for bias correction is available.
Resolution and Domain: Most of North American, interpolated to 0.125° spatial resolution.
Timestep: 3-hourly
Period: 1951-2100
Storage:
Graham:
/project/6008034/Model_Output/CCRN/CanRCM4/CanRCM4_Raw
Instructions:
- use the same script as the CanRCM4-WFDEI-GEM-CaPA dataset - modify for file names
CanRCM4-WFDEI-GEM-CaPA
- This dataset is the high resolution (0.125°) 15 ensemble members of CanRCM4, bias corrected against WFDEI-GEM-CaPA
- Only MESH 7 required variables were bias corrected for the period 1951-2100
- Elvis Asong performed the bias correction using Alex Cannon's Multi-variate bias correction method (MBCn)
- The resulting dataset reflects variables at 40 m height (same as the reference dataset)
Resolution and Domain: Most of North America, 0.125 deg resolution
Timestep: 3 hourly
Period: 1951-2100
Source: https://essd.copernicus.org/articles/12/629/2020/
Storage:
Server: \\datastore\GLOBALWATER\Model_Output\280_CanRCM4_Cor_WFDEI-GEM-CaPA
Graham: /project/6008034/Model_Output/CCRN/CanRCM4/WFDEI-GEM-CaPA
FRDR (public): https://www.frdr.ca/repo/handle/doi:10.20383/101.0162 (clipped extents to Mackenzie River Basin)
Cuizinart (public): https://tuna.cs.uwaterloo.ca/ (select "canrcm4-wfdei-gem-capa" from drop-down list)
Script: Read_BIAS_Corrected_CanRCM4_WFDEI_GEM_SingleVAR.m
Instructions:
- You need to have the drainage database of the basin or at least the header and RANK field of it
- Please read the comments at the top of the script and within carefully.
- This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\. It needs to be configured to write data as described in the header comments. It is recommended that data be written to a local HDD for speed.
- The script assumes no interpolation is needed, i.e. basin grid has the same resolution as the data and is aligned to that of the data, i.e. it only clips the data for the basin, and formats it using the RANK field to write the space-saving SEQ format.
Script: Interpolate_BIAS_Corrected_CanRCM4_WFDEI_GEM_SingleVAR.m.
Instructions:
- Use this script if you need to interpolate the data (i.e. if the grid has a different resolution or is not aligned to the current grid)
- The script uses bilinear interpolation from a regular (LAT-LON) grid to another regular grid (LAT-LON). Processed data is saved in SEQ format.
CanRCM4-WFDEI
- This dataset is the medium resolution (0.44°), 15 ensemble members of CanRCM4, bias-corrected against WFDEI
- Only MESH 7 required variables were bias-corrected for the period 1951-2100
- Elvis Asong performed the bias correction using Alex Cannon's Multi-variate bias correction method (MBCn)
- The resulting dataset reflects variables at surface height.
Resolution and Domain: Most of North America, 0.5 deg resolution
Timestep: 3 hourly
Period: 1951-2100
Storage:
Server: \\datastore\GLOBALWATER\Model_Output\200_CanRCM4_cor_WFDEI
Graham: /project/6008034/Model_Output/CCRN/CanRCM4_0.5/CanRCM4/CanRCM4_CORR_WFDEI
FRDR (public): https://www.frdr-dfdr.ca/repo/handle/doi:10.20383/101.0230
Script: see scripts for CanRCM4-WFDEI-GEM-CaPA above.
Future (only) Datasets
WRF_PGW
- This dataset is a dynamical downscaling of ERA-Interim perturbed by 30-yr multi-model mean changes over the period 2070-2100 under RCP 8.5 (pseudo global warming method - PGW).
- WRF simulations were performed and post-processed by the Yanping Li group - see https://hess.copernicus.org/articles/23/4635/2019/.
Resolution and Domain: Covers the CCRN domain (Western Canada) at 4 km resolution
Timestep:
Period: Oct 2000 - Sep 2015
Storage:
Server: \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\PGW_Post_Processed
Graham: /project/6008034/Model_Output/WRF/WRF4KM_CA/PGW_Post_Processed
- Surface variables important for hydrology are stored at: /project/6008034/Model_Output/WRF/WRF4KM_CA/HYDRO/PGW
Scripts: See "WRF_CTL" above
Instructions:
- To process WRF data for a particular basin, there are a few steps:
- The process is done partly on Graham using a set of shell and NCL scripts (written by Sopan Kurkute and edited by Zhenhua Li and Mohmed Elshamy) and then completed on Windows using a MATLAB Script (refer to WRF_CTL, above, for the process).
- You do not need to repeat step 1 if you have done the CTL set.
- Use convert_to_r2c.PGW.sh in step 2.
- Adjust your forcing directory and file names in the MATLAB script in step 3/4.
WRF-PGW-GEM-CaPA
- WRF_PGW bias corrected against GEM-CaPA using the overlap period with WRF_CTL (Sep 2004 - Sep 2015) using Alex Cannon's Multi-variate bias correction method (MBCn) with 200 iterations.
- Bias correction was done by Zhenhua Li during the summer of 2018
Resolution and Domain: This dataset covers the CCRN (Western Canada) domain at 0.125 deg resolution and reflects variables at 40 m height (as the reference dataset)
Timestep:
Period: Oct 2000 - Sep 2015 (corresponding to some 15 year period in 2070-2100)
Storage:
Server: \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\134_WRF-PGW_Corrected_to_GEM-CaPA_monthly_MBCn_it200
Graham: /project/6008034/Model_Output/WRF/WRF4KM_CA/134_WRF-PGW_Corrected_to_GEM-CaPA_monthly_MBCn_it200
Instructions:
- To process the data for a sub-basin, use the same scripts provided above for WRF-CTL-GEM-CaPA. The scripts can be easily set to process PGW instead of CTL data.
Related Pages