Enabling Data Sharing Through the Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC)




Gibeaut, James


Our ability to improve society’s understanding of the Gulf of Mexico ecosystem, including humans, and to ensure the Gulf ’s long-term environmental and public health requires access to a wide array of data. Understanding the impacts of petroleum pollution and related stressors on marine and coastal ecosystems and human populations calls for the ability to integrate and analyze data from diverse sources, across disciplines, and from varied spatial and temporal scales. One of the more frequent observations hindering determination of Deepwater Horizon spill ecosystem impacts revolved around the lack of baseline data from many disciplines. Data are required to make informed decisions about the management of complex systems, particularly relating to impacts, future response, mitigation, and restoration following spills and natural disasters. Changes in the ways scientists gather, manage, and analyze data are driven, in some cases, by the availability of innovative new data gathering tools and new low-cost computing capabilities. Other changes are driven by how and what data, particularly public health data, are collected and accessed. Society, however, is also demanding change (McNutt et al., 2016). The public wants increased transparency. Decision-makers from all sectors are calling for reproducibility and validation. As public and environmental health become increasingly interconnected, health professionals and policymakers require timely access to reliable and robust monitoring data that provide a baseline for informed decision making to promote the health and well-being of ecosystems and the people who live and work in these systems. The science community is beginning to recognize and address this need for large, accessible, integrated data sets. Recently, the National Oceanic and Atmospheric Administration (NOAA) announced it will be partnering with five Web organizations—Amazon Web Services, Microsoft Azure, IBM, Google, and the Open Cloud Consortium—through a Cooperative Research and Development Agreement (CRADA) to organize and make NOAA’s data more easily accessible and usable (https://www.commerce. gov/news/press-releases/2015/04/ussecretary- commerce-penny-pritzkerannounces- new-collaboration-unleash). Access to data generated by the Gulf of Mexico Research Initiative (GoMRI) can make a direct difference to understanding, responding to, and mitigating future oil spills. GoMRI recognized this early on in the program’s development, and the Gulf of Mexico Research Initiative Information and Data Cooperative (GRIIDC; https:// data.gulfresearchinitiative.org) serves as an excellent example of how data integration and consistency can provide exponential added value to the research community as a whole. Research investigations funded by GoMRI have resulted in a large pulse of scientific data produced by studies ranging across the program’s five research themes (Shepherd et al., 2016, in this issue). Data sets from laboratory, field, and modeling activities describe phenomena ranging from microscopic fluid dynamics to large-scale ocean currents, from bacteria to marine mammals, and from detailed field observations to synoptic mapping. One of GoMRI’s central tenets is to ensure that all data are preserved and made publicly available, and GRIIDC ensures a data and information legacy that promotes continual scientific discovery and public awareness of the Gulf of Mexico ecosystem. Open data requirements are increasing in number and enforcement. There are many reasons for the effective curation and sharing of data, including (1) providing environmental baselines for gauging the effects of episodic events such as storms or oil spills, (2) increasing the efficiency of the scientific process through reuse of data and providing direction for future data acquisitions, (3) increasing public trust by making data available that are used in applying and developing public policy, and (4) enabling new discoveries through data mining. GoMRI became a leader in the move toward open scientific data in 2011 when BP and the Gulf of Mexico Alliance established in their Master Research Agreement (MRA) a research database from which all data are to be made “fully accessible” with “minimum time delay.” The MRA also charges the GoMRI Research Board with developing data policies and the GoMRI Administrative Unit with administering the research database. The Research Board established that “fully accessible” meant publicly available with documentation (metadata) to make data sets understandable and reusable. Further, the phrase “minimum time delay” was defined as within one year of data acquisition or before publications appear that use the data. This “one-year or before publication” requirement is ambitious and on the forefront of data- sharing policies of research funding organizations. It has caused the program to focus on data management throughout the data life cycle and requires a commitment of time and resources by researchers. It has also created the need for GRIIDC to develop processes and resources for data planning, tracking, and archiving as well as training for researchers. This article describes the structure of GRIIDC and the approach to meeting a stringent open data requirement.






