2. The Team
The GWF Data Management Team consists of 4 faculty leads, 4 data managers, and 2 support staff spread out across the 4 major partner universities of the GWF program. The faculty leads provide advice and direction to the data managers. The data managers are tasked with the development and implementation of the data management strategy. We assist researchers and projects with the management of their data throughout the research lifecycle. The data managers meet every 2 weeks to discuss topics of interest. Our members liase with experts from library and technology organizations (Portage & ICT), and research ethics boards.
|University||University of Saskatchewan||University of Waterloo||McMaster University||Wilfrid Laurier University|
|Faculty Lead||Jimmy Lin||Mike Waddington|
|Data Manager||Bhaleka Persaud||Krysha Dukacz||Gopal Saha|
|Support Staff||Juliane Mai3|
1Research Analyst; 2IT Cordinator and Specialist and Data Management Team Lead; 3Assistant Professor, Big Data Dissemination
3a. Management of Data throughout the Research Lifecycle
As a broad consortium of investigators from allied research fields, as well as government, industry, indigenous, and community partners, GWF’s data management needs are diverse. Developing data management plans, workflows, methods, and strategies that can standardize practices while maintaining flexibility for the different subject domains and stakeholder groups to operate is an overarching goal of the Data Management Team. Data management needs within GWF can be articulated in terms of planning, description / markup (metadata), storage / backup, security and access, training, and preservation and discovery activities.
Planning. As a large, interdisciplinary research program, GWF requires a significant amount of data management planning, both prior to and after project inception. Identifying, categorizing, and solving storage needs, security concerns, architecture and metadata requirements, access and transfer needs, as well as preservation and discovery platforms are required for the success of GWF.
Description / markup (metadata). Given the size and scope of the GWF program, developing workflows for metadata creation, description, and enrichment is required. Project-level, data-level, and administrative-level metadata should be collected, and whenever possible, standardized to existing practices and standards.
Storage / backup. Securing centralized backup facilities to serve as housing for original as well as the processed and derived research data collections is of the utmost importance for the GWF program. Given the high number of co-I’s and HQP, different methods of data transfer, and geographic distance from field to lab, facilities with efficient data recovery procedures would create a reliable work environment for the GWF research community.
Security and Access. GWF collects a large amount of sensitive data, including traditional knowledge and data collected with indigenous partners, data from government and industry partners, and data collected from human participants. There is a stark need for security provisions and embargoes, as well as an understanding of the need for controlled access for researchers, HQP, and community stakeholders working with the data.
Training. HQP and Investigator data management training has been identified as a key Data Management requirement.
3b. Data Governance
In order to assist researchers in managing their data and to facilitate the consistency of data from across the program, structures will be developed to serve as guidance tools. These include the following:
Define Best Practices and Processes. Working with the SMC and subject matter experts develop best practices documents to guide researchers
Formalize a structure of roles and responsibilities. Define rights and obligations of participants (data embargoes, expectations for movement of data into the repository, etc.)
Communications. Develop means of communicating data management information to researchers
3c. Data Stewardship
To assist projects in meeting the goals of the Data Management Strategic plan, the Data Management team will work to capacitate researchers with the techniques and tools to properly steward their data throughout their research program. Such initiatives will include:
Managing Data Assets. GWF Data Management Team will be developing a framework to manage and oversee the data assets in order to provide high quality data in an accessible and consistent manner.
Preservation and Discovery. Providing the means and mode for the preservation and discovery of research data collected by GWF is a key requirement of the Data Management program.
Data and Metadata Enrichment. Developing practices and workflows to enable data and metadata enrichment prior to ingest, QA/QC patterns, and solutions for hosting, indexing, and enabling access and sharing of data are needs of the program.
3d. Provision of Input and Assistance to Data Related Initiatives
GWF Data Management Team will work with the Computer Science Team on the following initiatives:
Improvement of delivery of data and knowledge. Assistance in delivery of large datasets and complex model inputs and outputs to and from diverse stakeholders. Support in testing and delivering the specialized information queries as well as the visualization and interaction modules.
Improvement of management of sensor data. Help in developing an efficient approach to integrating spatially and temporally dense environmental data streams from terrestrial and remote sensing monitoring platforms in order to support science-based decision making for end-users from diverse geographic settings.
Data Management toolset and framework. Help in development of the framework and tools to support an at-scale system that is responsible for storing, managing, processing and analyzing data collected throughout the life of the project.
GWF Cloud Components. Help in building an understanding of the existing and future data to inform the development of the PaaS (Platform-as-a-Service), which provides a data management platform for collecting, storing, integrating, sharing and managing GWF data. Support SaaS (Software-as-a-Service), platform for analyzing, visualizing and supporting scientific workflows for GWF and data related support for mobile end-user and citizen science applications. Providing all necessary activities to ensure the data is appropriately integrated into the GWF cloud.
The GWF Data Management Team will work with the Core Modelling Team on the following initiatives:
Input / Output / Format. Obtaining and managing inputs, and outputs of the environmental models; assisting as needed with the data extraction and archival processes; assisting as needed in providing the standardized input and output formats for most common models.
Data Access Support. Providing support to create accounts and access various storage and processing resources on Compute Canada Graham allocation designated for the GWF use; providing support to access other processing and archival space as well as services available through the University of Saskatchewan, McMaster University, Wilfrid Laurier and University of Waterloo networks.
In summary, the Data Management Team will work with Core Teams and Individual Projects to identify or develop procedures and processes to assist researchers in managing their data. The team, in collaboration with other personnel, will also work to support Reproducible Research through workshops on managing and reusing the scripts for various levels of data processing. Additional workshops will be offered to help researchers move toward a tidy data (or similar) approach that allows data to be more easily ingested and shared.
4. Contact Us
Please direct questions, ideas, and/or concerns to your local GWF data manager from the list below. We look forward to assisting you with implementing data management processes in your research.