A new publically available database will catalog metadata associated with biologic samples, making it easier for researchers to share and reuse genetic data for environmental and ecological analyses.
The resource, called the Genomic Observatories Metadatabase (GeOMe), was developed by researchers at the Smithsonian's National Museum of Natural History and eight other museums and research institutions. It links publically available genetic data to records of where and when samples were collected, providing contextual information that until now has been missing from widely shared databases. Such information is critical for comparing biodiversity in different locations worldwide and tracking it across time. But despite calls for more data sharing within the research community, researchers have until now lacked the tools to make this information readily available.
The developers of the database, described Aug. 3 in the journal PLOS Biology, said standardizing and preserving this metadata will greatly enhance the value of the genetic sequence data that researchers are already collecting. With GeOMe, researchers will be able to find and access genetic data collected at specific times and places anywhere in the world, enabling them to ask big questions about the structure and sustainability of life on the planet. For example, they might investigate how the inhabitants of a specific altitude throughout the world have shifted as the planet's climate has changed.
"Tracking biodiversity through global change is a collaborative effort," said Christopher Meyer, a research zoologist at the National Museum of Natural History who helped lead GeOMe's development. "We can't do it on our own. GeOMe will advance big data and discovery for the future, allowing the sum of scientific endeavors to far exceed individual research products."
Scientists who analyze ecological samples--whether they are plants or animals or entire communities of microbes, gathered from the oceans, freshwater or land--have their own systems for keeping track of when are where those samples were collected. But for the broader research community, such information has been difficult to obtain and impossible to comprehensively search. GeOMe provides a solution by permanently linking information about samples' temporal, environmental, geospatial and scholarly context to genetic sequence data stored by the National Center for Biotechnology Information.
Meyer said he and his colleagues devoted the time and resources to developing GeOMe because they knew it would be a powerful tool to accelerate discovery. As museum scientists, they recognize the value of tracking and preserving information. And as a leader in acquisition and dissemination of knowledge about biodiversity, Meyer said, it was important for the National Museum of Natural History to play a key role.
GeOMe's developers, including Eric Crandall at California State University, Monterey Bay, Michelle Gaither at the Hawai'i Institute of Marine Biology and John Deck at the Berkeley Natural History Museums, have worked to ensure that the resource is easy to use and adaptable for a wide range of needs. With the database and toolkit freely available to the research community, scientific journals can now mandate that authors make their metadata available in a searchable and standardized format, just as they have long done for genetic sequence data, they said.
Importantly, the team notes, data in GeOMe will conform to standards developed by the Genomic Standards Consortium and the Biodiversity Information Standards organization, ensuring that submitters capture and record the same essential information about every sample. These standards will ensure that in the future, researchers will be able to conduct analyses across datasets.
"[Our knowledge of] biodiversity is being written through genomic data--but the data doesn't mean much unless you can put it in context," Meyer said. "If we don't start implementing a tool like this now, our data will be less useful in perpetuity."
GeOMe's development was a collaboration between researchers and computer scientists at the following institutions: the Smithsonian's National Museum of Natural History; Berkeley Natural History Museums at the University of California, Berkeley; the Hawai'i Institute of Marine Biology at the University of Hawai'i; Biocode; Texas A&M University-Corpus Christi; the University of California's Gump South Pacific Research Station, in Moorea, French Polynesia; Berkeley Institute for Data Science at the University of California; and the University of Queensland in Australia; and California State University, Monterey Bay.
Funding for this study was provided by the National Science Foundation, the Gordon and Betty Moore Foundation and the National Oceanic and Atmospheric Administration.