|Home||Archived February 20, 2019||(i)|
Unlocking Oceans of Model Data via Web Services
Numerical modeling systems are increasingly being used to forecast and understand U.S. coastal and continental-shelf seas, addressing such issues as contaminated sediments, harmful algal blooms, oil spills, and coastal erosion. As computing power grows, so does our ability to represent finer scales and larger domains, thus increasing the amount of model output. For example, the Coupled Ocean-Atmosphere-Wave-Sediment Transport (COAWST) model developed by U.S. Geological Survey (USGS) scientists at the USGS Woods Hole Coastal and Marine Science Center in Woods Hole, Massachusetts, produces 8 gigabytes of model data every day. Hindcast simulations (which test mathematical models by using observational data from past events) at the same center are typically in the 10- to 30-gigabyte range. These models also have large appetites for input: they ingest output files from other models and live data streams from river gauges, weather stations, oceanographic instruments, and satellites. The quantity of the digital data produced and consumed by these models requires special approaches to allow efficient access, especially if the data are to be shared effectively with collaborators and the rest of the international research community.
At the USGS Woods Hole Coastal and Marine Science Center, we have been working with the National Oceanic and Atmospheric Administration (NOAA) Integrated Ocean Observing System (IOOS), the National Science Foundation (NSF) Ocean Observatories Initiative (OOI), and the international climate community to adopt a common approach to handling model-data output. Data from modeling teams or instruments are served in their original format by providers or local data-access centers and augmented with metadata (information about the data, such as what systems were used for data collection, what map projections are used for geospatial data, and so on) to allow standardized representation. These datasets are then made available via Web services, allowing the creation of user toolsets and applications that can access the different models used in the community without specialized software for each model.
This standardized-Web-services approach is being used by several USGS Woods Hole Coastal and Marine Science Center projects. For example, John Warner and Brandy Armstrong, who developed the COAWST modeling system, deliver daily forecasts of wind and current velocities, sea-surface temperatures and heights, suspended-sediment transport, and wave heights for the U.S. east and gulf coasts at 5-km resolution. Each new daily forecast consists of an 8-gigabyte file and becomes available to users as part of a growing, cumulative simulation archive (currently 2.6 terabytes). The Web service allows users to extract just the variables they need in specified time, longitude, latitude, and water-depth ranges of interest. Brad Butman and 'Soupy Dalyander' (see "Patricia Dalyander Is New Mendenhall Postdoctoral Research Fellow in Woods Hole," this issue) are taking advantage of this Web service to calculate bottom stress from COAWST wave-height and current-velocity forecasts over a 1-year period and using the results to examine the distribution of mean bottom stress on the east and gulf coasts. (The higher the bottom stress, the greater the likelihood that bottom sediment, and any associated pollutants, will be transported by the water.) Neil Ganju is using the approach for his collaborative circulation and sediment-transport studies near the Martha's Vineyard Coastal Observatory. Chris Sherwood and Rich Signell are using the approach to share model input and output with academic researchers working on a National Science Foundation (NSF) Rapid Response Research (RAPID) project simulating the three-dimensional dispersal of aging oil—a five-institution study relevant to the Deepwater Horizon oil spill in the Gulf of Mexico. In total, more than 13 terabytes of model data are available and being used by USGS researchers and their U.S. and international collaborators.
Simulations that produce these large datasets run remotely on high-performance computing clusters, where resources can be pooled and shared. Most simulations are performed on clusters at the Woods Hole Oceanographic Institution (WHOI). Instead of bringing massive model-output files back to local desktops or to a local server at the USGS Woods Hole office, the model data are left on large fibrechannel disk arrays and made available via the standardized Web services. Processing, analysis, and visualization procedures written in a high-level language like Matlab access the remote model data directly and copy only the relevant portions across the network. In this way, the same procedures used internally within the USGS research group can be used by external collaborators or other scientists without modification, thus allowing others to mine the model data in order to yield insights and understanding beyond the resources of the USGS. It also makes our research more transparent and accountable, in the spirit of "Open Notebook Science" (a term coined by Jean-Claude Bradley of Drexel University for the online sharing of "raw experimental data along with the researcher's interpretation in a format that anyone can easily re-analyze, re-interpret and re-purpose"; http://drexel-coas-elearning.blogspot.com/2006/09/open-notebook-science.html).
Rich Signell spent much of 2009 on detail to NOAA helping to implement the standardized-Web-services approach across all 11 regions of the U.S IOOS. This approach is currently being implemented to provide unified access to gridded data across NOAA (http://geo-ide.noaa.gov/) and is part of the USGS Council for Data Integration plan for Fiscal Year 2011 (which began October 1, 2010). Implementing the approach across NOAA and the USGS will allow anyone on the Internet to interactively browse, download, and analyze data from hundreds of terabytes of oceanographic (and atmospheric) model output using efficient, standard tools—effectively "unlocking" their scientific content.
Research and experimentation on how best to conduct our science, exchange observational and model data, and disseminate results by using Internet technology is an increasingly important component of USGS research, especially as data-intensive scientific discovery begins to challenge traditional approaches. Scientists and information-technology (IT) professionals will need to work together in order to continue to promote advanced computing capabilities that help researchers share, manipulate, and explore massive datasets while providing appropriate security measures.
in this issue:
Unlocking Model Data via Web Services
|Home||Archived February 20, 2019|