image: Early Cambrian trilobites (Bradyfallotaspis nicolascagei, KUMIP 320706) from the Northwest Territories, Canada.
Credit: Division of Invertebrate Paleontology, University of Kansas Biodiversity Institute
LAWRENCE — A conclave of about 20 prominent paleontologists, data scientists and editors from academic journals will gather Aug. 4-5 at the University of Kansas Biodiversity Institute and Natural History Museum to improve how data is shared among professionals in the field — and beyond.
The event, supported by the National Science Foundation, includes scientists, students and experts from around the world representing key institutions in paleontology like the American Museum of Natural History, Harvard University, Yale University, the University of California-Berkeley and the University of Florida.
“The workshop aims to bring together people — mostly from the United States, but also a few from abroad — who are involved with paleontology journals and publish basic paleontological research,” said Bruce Lieberman, director of the KU Paleontological Institute, which will host the gathering.
The participants will focus on making data that paleontologists publish more broadly accessible to the scientific community and general public, according to Lieberman.
“Part of the way we've addressed this is by making publications open access,” he said. “But even then, an article is just a PDF usually, which kind of stands alone. We want to make it easier to take those data and get them out to relevant parties. It’s really about taking this data and allowing it to be repurposed in the way that best suits analysis.”
Lieberman said the gathering of paleontologists, data scientists and journal editors is seeking buy-in from the larger academic field on the best ways to make paleontological data aligned with “FAIR” practices.
“FAIR means ‘Findable, Accessible, Interoperable and Reusable,’” said Lieberman, who also serves as Dean’s Professor of Ecology & Evolutionary Biology and senior curator with the KU Biodiversity Institute and Natural History Museum. “That’s kind of the big buzzword in data science now.”
The scientists and editorial personnel will consider nuts-and-bolts changes to how data is disseminated as well as deeper questions on how findings and data are shared in the field and beyond.
For Lieberman and Natalia López Carranza, collection manager with the Invertebrate Paleontology Division at the KU Biodiversity Institute, questions about publishing and crediting data are more than theoretical. Via the KU Paleontological Institute, both are involved in publishing the “Treatise on Invertebrate Paleontology” a multivolume reference that documents every known example of fossil invertebrate life.
“I think part of the problem is the traditional way of sharing science through publications,” López Carranza said. “Maybe that’s a bit of a controversial statement, because it’s what has been done for centuries — people publish papers, books, etcetera — but we are in a moment in time where there is high demand for large quantities of data.”
López Carranza said data needs to be easier to access and use in the digital age.
“People are analyzing huge datasets relatively easily with their personal computers, and there’s a strong demand from scientists to get data — and to get it easily,” she said.
In addition to improving data accessibility and reusability, participants — who come from organizations sharing “heritage” systematics literature online and mostly are practitioners in systematics, collections digitization, paleoecology and paleontological education — will focus on the philosophy underpinning access to data in their field.
When needed, they’ll borrow from other scholarly disciplines known for cutting-edge approaches to data.
“The challenge is how to transition from a PDF that presents information in a narrative format to data that’s easy for scientists to use,” López Carranza said. “For example, in genetics and molecular biology, there are large databases like GenBank that serve as providers for massive datasets. In organismal biology, we have some resources that share data as well. There are aggregators that show specimens kept in museum collections — examples include iDigBio or GBIF.”
Another facet of the workshop will address how work is credited among scientists. In a field where collaboration is common, as are research assistants, postdocs and students involved in research and fieldwork, how should credit be given? Can credit for original data be preserved in a “chain of custody” — especially when new research often hinges on work performed by previous generations of scientists, often separated by decades?
“We’re trying to capitalize on preexisting data, but we’re also recognizing that the landscape in science and publishing is changing,” Lieberman said. “We want to make sure that the primary work — collecting and creating data — is still incentivized, and publications that disseminate data are still rewarded for doing so. It’s not just about reanalyzing things; it’s also about crediting both past and future generations who do the foundational work. It’s getting harder and harder for people doing fieldwork today to receive proper credit.”