Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

During the course of the CCRN project, we (the CCRN project team) downloaded, processed and created several datasets for running MESH. To run MESH, 7 variables are needed at sub-daily time step (see Meteorological Input). This page documents the locally available datasets and scripts developed to extract the data for a sub-basin (provided it is within the domain of the datasets). This is a top-level document; each dataset is further described in README files (and papers) in their storage locations mentioned. It is the responsibility of the user to make sure that the data processing is correct and that the scripts work correctly for the basin provided. Any bugs or questions can be directed to Elshamy, Mohamed. The scripts provided process one variable at a time. One can easily parallelize by invoking several instances of MATLAB and changing the variable being processed. Another way would be to build a loop into those scripts to process all variables and use the MATLAB parallel pool. The scripts print the input and output file names and the time stamps ... please take care that the time progresses in the manner expected, depending on the time step of the dataset (hourly, 3hourly, etc.) and time span. This presentation sheds some light on the datasets: Bias Corrected Climate Forcing Datasets for Land Surface Modeling.pdf


If you have access to Graham, then you can download the data to use the MATLAB scripts (they were written and tested on Windows machine) or if you have a MATLAB license on Graham, then you can transfer the scripts there and adjust them (normally all what's needed is switch the path slashes from '\' to '/') but it is your responsibility to test them. 


If you are at GIWS and have access to GLOBALWATER share on datastore, you can access the files there.


If you do not have access to either location, you need to download the data from their original sources if they are published already. Otherwise, contact me or the data custodian to see if you can get access.


Historical Datasets

  • WFD: Watch Forcing Data - Global surface data at 3 or 6 hourly time step and 0.5 deg spatial resolution for the period 1901-2001 - obtained from http://www.eu-watch.org/ and stored at:
    \\datastore\GLOBALWATER\giws_research_water_share\JeffersonW\0_Raw_Data\ClimateForcing\ClimateForcing_WATCH

    To process the data for a sub-basin (you need to have the drainage database of the basin or at least the header and RANK field of it), use the following MATLAB script:  Interpolate_WFD_NC_Seq_SingleVAR.m. Please read the comments at the top of the script and within carefully. This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\. It needs to be configured to write data as described in the header comments. It is recommended that data be written to a local HDD for speed. The script uses bilinear interpolation from a regular (LAT-LON) grid to another regular grid (LAT-LON). Processed data is saved in SEQ format. Snowfall and Rainfall are supplied as separate components that need be summed for use into MESH. Use the following script: Interpolate_WFD_Rain_Snow_NC_Seq.m to interpolate both components then write the sum to a single file for precipitation.

  • WFDEI: WATCH-Forcing-Data-ERA-Interim - Global surface data at 3 hourly time step and 0.5 deg spatial resolution for the period 1979-2016 - obtained from http://www.eu-watch.org/ and stored at:
    \\datastore\GLOBALWATER\giws_research_water_share\JeffersonW\0_Raw_Data\ClimateForcing\ClimateForcing_WFDEI
    Various versions of this dataset clipped for the CCRN domain and a bigger North American domain exist on the server. This dataset is maintained by Graham Weedon from the UK Met Office and updated regularly to include current data
    A version for the NA domain can be found at: \\datastore\GLOBALWATER\Model_Output\81_WFDEI_1979_2016
    and on Graham at: /project/6008034/Model_Output/CCRN/WFDEI_1979_2016

    To process the data for a sub-basin (you need to have the drainage database of the basin or at least the header and RANK field of it), use the following MATLAB script: Read_WFDEI_SingleVAR.m. Please read the comments at the top of the script and within carefully. This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\. It needs to be configured to write data as described in the header comments. It is recommended that data be written to a local HDD for speed. The script assumes no interpolation is needed, i.e. basin grid has the same resolution as the data and is aligned to that of the data, i.e. it only clips the data for the basin, and formats it using the RANK field to write the space-saving SEQ format. This script processes only the NA version of the data that is already interpolated to 0.125-degree resolution, not the original data.

  • Princeton: Princeton dataset v2 - Global surface data at 3 hourly time step, 0.5 deg spatial resolution for the period 1901-2012 - obtained from http://hydrology.princeton.edu/data.pgf.php
    \\datastore\GLOBALWATER\giws_research_water_share\JeffersonW\0_Raw_Data\ClimateForcing\ClimateForcing_Princeton

  • GEM-CaPA: Combines hourly forecasts from the Global Environmental Multiscale (GEM) atmospheric model at 40m with 6hourly Canadian Precipitation Analysis - GEM has global coverage but CaPA is restricted to North America. Available from 2002 onwards. Spatial resolution increased over time from 25 to 15 to 10km. A NetCDF version of the data has been processed by Dan Princz covering the period Sep 2004 - Aug 2017 and is stored at:
    \\datastore\GLOBALWATER\Model_Output\101_GEM-CaPA_interpolated_0.125
    and on Graham at: /project/6008034/Model_Output/101_GEM-CaPA_interpolated_0.125

    An aggregated 3hourly version of the data covering the period 2005-2016 has been stored at: \\datastore\GLOBALWATER\Model_Output\103_GEM-CaPA-3h
    and on Graham at: /project/6008034/Model_Output/CCRN/GEM_CaPA_2005_2016

  • WFDEI-GEM-CaPA: This dataset combines the strength of GEM-CaPA (better agreement with observations over Canada) with the length of WFDEI to construct a longer time series that resembles GEM-CaPA to be used for climate analysis and model calibration/validation purposes. This dataset is restricted to a large domain covering most of North America (intersection of the WFDEI and GEM-CaPA domains) and has 3 hourly resolution, Variables are assumed to be at 40m because the reference dataset is at 40m. Combining the two datasets was done by Elvis Asong.

    This dataset has been published (http://dx.doi.org/10.20383/101.0111) (also the GEM-CaPA 2005-2016 and WFDEI 1979-2016 were part of the published dataset) and a paper was submitted about it to ESSD-Discussions (but later withdrawn) and can be reached at https://www.earth-syst-sci-data-discuss.net/essd-2018-128/

    Locally the dataset is stored at: \\datastore\GLOBALWATER\Model_Output\181_WFDEI-GEM_1979_2016

    and on Graham at: /project/6008034/Model_Output/CCRN/WFDEI-GEM_1979_2016

    To process the data for a sub-basin, use the same script used for the NA version of the WFDEI data mentioned above: Read_WFDEI_SingleVAR.m


  • WRF-CTL: Covers the CCRN domain (Western Canada) at 4km over the period Oct 2000 - Sep 2015. This dataset is a dynamical downscaling of ERA-Interim by the Weather Research and Forecasting (WRF) model done by Yanping Li group. It is stored locally at:
    \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\CTRL_Post_Processed
    On Graham it can be found at: /project/6008034/Model_Output/WRF/WRF4KM_CA/CTRL_Post_Processed
    but surface variables important for hydrology are stored at: /project/6008034/Model_Output/WRF/WRF4KM_CA/HYDRO/CTRL

    To process WRF data for a particular basin, there are a few steps. The process is done partly on Graham using a set of shell and NCL scripts (written by Sopan Kurkute and edited by Zhenhua Li and Mohmed Elshamy) and then completed on Windows using a MATLAB Script. The scripts use the NCL lanague to interpolate the WRF data from its rotated-pole coordinate system to a regular LAT-LON grid using "conserve" interpolation which maintains the totals. This is especially important when interpolating from a fine grid (WRF is 4km) to a coarser grid (usually 10km for MESH). The steps are as follows: First on Graham (or other servers that you may move the data to):

    1.  Configure and run the script: convert_to_r2c.wgt.sh which will process a single month to get the weighting matrix for your basin. configuration means setting some information about the dimensions and location of your basin and the location of input and outputs. Change the output and source (where the scripts reside) folders only. This script will produce a file called "weights_file.nc" in the output and you need to copy/move it to the source folder.

    2.  Configure and run the script: convert_to_r2c.CTL.sh to your grid specifications. You can also configure which years to process and which months.  The first year goes from month 10 - 12 and has to be processed separately. These scripts produce monthly R2C files. The script uses NCL scripts to process all variables together. Files are arranged per variable in annual folders (each year in a separate folder). If you wish to process some variables, comment blocks of code for unwanted ones.

    Note that you will need to create a job script (SLURM) on Graham that calls the shell script and submit the job. Processing time depends on the size of the basin but is usually several hours. It parallelizes automatically so it would be good to ask for 8 cores - one per variable. The scripts are included in this zip file: WRF.zip

    3. Collect all files from the annual folders into one folder per variable (keep the variable folder names) and transfer them a windows machine to run the MATLAB script: multi_r2c_2_seq_SingleVAR.m that will collate all R2C files for a variable into a single SEQ file for that variable. You can parallelize the operation by invoking several instances of MATLAB and processing a variable in each instance. The script is currently configured to process the whole period from Oct 2000 - Sep 2015. There is one unit conversion made within the script for precipitation as it is originally in mm/hr and it is required to be in mm/sec (which is the same as kg/m2/sec given a density of 1000 kg/m3 for water).


  • WRF-CTL-GEM-CaPA: WRF_CTL bias corrected against GEM-CaPA using the overlap period (Sep 2004 - Sep 2015) using Alex Cannon's Multi-variate bias correction method (MBCn) with 200 iterations. This dataset covers the CCRN (Western Canada) domain at 0.125deg resolution and reflects variables at 40m height (as the reference dataset) for the period Oct 2000 - Sep 2015. It was done by Zhenhua Li during the summer of 2018 and is stored locally at: \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\124_Corrected_to_GEM-CaPA_monthly_MBCn_it200
    and on Graham at: /project/6008034/Model_Output/WRF/WRF4KM_CA/124_WRF-CTL_Corrected_to_GEM-CaPA_monthly_MBCn_it200

    To process the data for a sub-basin (you need to have the drainage database of the basin or at least the header and RANK field of it), use the following MATLAB script:  Read_WRF_Cor_GEM_CaPA_SingleVAR.m. Please read the comments at the top of the script and within carefully. This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\ . You need to configure the output data folder and it is recommended that the processed data is written to a local HDD (not a network share). It improves speed and avoids cluttering the GLOBALWATER\Model_Output network share. The script assumes no interpolation is needed, i.e. basin grid has the same resolution as the data and is aligned to that of the data, i.e. it only clips the data for the basin, and formats it using the RANK field to write the space-saving SEQ format. If you need to interpolate the data (if the grid has a different resolution or is not aligned with the current grid), use the following script: Interpolate_WRF_CTL_Cor_GEM_CaPA_SingleVAR.m which has a very similar structure to the clipping routine and will interpolate the data using bilinear interpolation.

Future Datasets

  • WRF_PGW: Covers the CCRN domain (Western Canada) at 4km over the period Oct 2000 - Sep 2015. This dataset is a dynamical downscaling of ERA-Interim perturbed by 30-yr multi-model mean changes over the period 2070-2100 under RCP 8.5 (pseudo global warming method - PGW). WRF simulations were performed and post-processed by Yanping Li group. It is stored locally at:
    \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\PGW_Post_Processed
    On Graham it can be found at: /project/6008034/Model_Output/WRF/WRF4KM_CA/PGW_Post_Processed
    but surface variables important for hydrology are stored at: /project/6008034/Model_Output/WRF/WRF4KM_CA/HYDRO/PGW

    To process WRF data for a particular basin, there are a few steps. The process is done partly on Graham using a set of shell and NCL scripts (written by Sopan Kurkute and edited by Zhenhua Li and Mohmed Elshamy) and then completed on Windows using a MATLAB Script. Refer to WRF_CTL for the process. You do not need to repeat step 1 if you have done the CTL set. Use convert_to_r2c.PGW.sh in step 2. Adjust your forcing directory and file names in the MATLAB script in step 3.

  • WRF-PGW-GEM-CaPA: WRF_PGW bias corrected against GEM-CaPA using the overlap period with WRF_CTL (Sep 2004 - Sep 2015) using Alex Cannon's Multi-variate bias correction method (MBCn) with 200 iterations. This dataset covers the CCRN (Western Canada) domain at 0.125deg resolution and reflects variables at 40m height (as the reference dataset) for the period Oct 2000 - Sep 2015 (corresponding to some 15 year period in 2070-2100). Bias correction was done by Zhenhua Li during the summer of 2018 and is stored locally at: \\datastore\GLOBALWATER\Model_Output\WRF\WRF4KM_CA\134_WRF-PGW_Corrected_to_GEM-CaPA_monthly_MBCn_it200
    and on Graham at: /project/6008034/Model_Output/WRF/WRF4KM_CA/134_WRF-PGW_Corrected_to_GEM-CaPA_monthly_MBCn_it200

    To process the data for a sub-basin, use the same scripts provided above for WRF-CTL-GEM-CaPA. The scripts can be easily set to process PGW instead of CTL data.

  • CanRCM4: This dataset covers the CORDEX North American Domain at an hourly time step. There is one high-resolution member at 0.22deg resolution and 15 members at 0.44deg resolution. The data runs from 1950-2100 forced by historical emissions till 2005 and by RCP 8.5 from 2006-2100. These model runs are dynamic downscaling of the CanESM2 earth system model. I downloaded a total of 21 variables including surface and lowest model level variables and water and energy balance components. The dataset is stored locally at:
    \\datastore\GLOBALWATER\Model_Output\110_CanRCM4-44
    \\datastore\GLOBALWATER\Model_Output\110_CanRCM4-22
    On Graham, it can be found at:
    /project/6008034/Model_Output/110_CanRCM4-44
    /project/6008034/Model_Output/110_CanRCM4-22

    To process the data for a sub-basin (you need to have the drainage database of the basin or at least the header and RANK field of it), use the following MATLAB script:  CanRCM4_NC_44_Seq_fullperiod.m. Please read the comments at the top of the script and within carefully. This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\ . You need to configure the output data folder and it is recommended that the processed data is written to a local HDD (not a network share). It improves speed and avoids cluttering the GLOBALWATER\Model_Output network share. This script interpolates the data from the CanRCM4 rotated pole grid to a regular LAT-LON grid as defined by the shed file using linear (but sparsely spaced) interpolation. Processed data is saved in SEQ format.

    The above script can be used for all variables. Windspeed at the surface is provided as sfcWind, but the lowest model level, meridional and zonal components need to be combined to get the resultant windspeed.

  • CanRCM4-WFDEI-GEM-CaPA: the medium resolution (0.44deg) 15 ensemble members of CanRCM4 were bias corrected against WFDEI-GEM-CaPA Only MESH 7 required variables were bias corrected for the period 1951-2100. Elvis Asong performed the bias correction using Alex Cannon's Multi-variate bias correction method (MBCn) over a domain covering most of North America at 3hourly time step and 0.125deg resolution. The resulting dataset reflects variables at 40m height as the reference dataset. This dataset is stored locally at:
    \\datastore\GLOBALWATER\Model_Output\280_CanRCM4_Cor_WFDEI-GEM-CaPA
    and on Graham at: /project/6008034/Model_Output/CCRN/CanRCM4/WFDEI-GEM-CaPA

    To process the data for a sub-basin (you need to have the drainage database of the basin or at least the header and RANK field of it), use the following MATLAB script: Read_BIAS_Corrected_CanRCM4_WFDEI_GEM_SingleVAR.m Please read the comments at the top of the script and within carefully. This script is prepared run from MATLAB on Windows and reads the data from \\datastore\GLOBALWATER which is mapped to drive V:\. It needs to be configured to write data as described in the header comments. It is recommended that data be written to a local HDD for speed. The script assumes no interpolation is needed, i.e. basin grid has the same resolution as the data and is aligned to that of the data, i.e. it only clips the data for the basin, and formats it using the RANK field to write the space-saving SEQ format. If you need to interpolate the data (if the grid has a different resolution or is not aligned to the current grid), use the following script: Interpolate_BIAS_Corrected_CanRCM4_WFDEI_GEM_SingleVAR.m. It uses bilinear interpolation from a regular (LAT-LON) grid to another regular grid (LAT-LON). Processed data is saved in SEQ format.