Public Release: 

Bringing 'dark data' into the light: Best practices for digitizing herbarium collections

New workflow modules will facilitate imaging and data transcription for thousands of plant specimens

Botanical Society of America

Imagine the scientific discoveries that would result from a searchable online database containing millions of plant, algae, and fungi specimen records. Thanks to a new set of workflow modules to digitize specimen collections currently preserved in herbaria, something like that might be within reach. The modules are provided by the National Science Foundation's (NSF) Integrated Digitized Biocollections (iDigBio), which is facilitating a collective effort to unify digitization projects across the nation.

"North America's herbaria curate approximately 74 million specimens and only a fraction have made it online," says iDigBio's digitization specialist Dr. Gil Nelson. "Having these data available at one's fingertips will enable advanced queries and new discoveries while ensuring inclusion of the so-called 'dark data' that reside in a significant percentage of the United States' more than 600 active herbaria."

According to recent estimates, approximately half of U.S. herbaria and universities have yet to begin mobilizing data. Nelson coordinated the development of the workflows, working alongside 28 other contributing authors, to provide guidance to institutions just beginning digitization programs as well as those seeking to streamline and tweak their current digitization configuration.

The 14 modules, each organized in seven to 36 easy-to-follow and customizable tasks, cover everything from setting up an imaging station to georeferencing. They also include methods to organize outreach events for public participation in imaging and data transcription. They are downloadable as Portable Document Format (PDF) and editable word processing files on GitHub and as PDF files at iDigBio. A full description of the workflows and their development, along with editable word processing files of the workflow modules, is available in the September issue of Applications in Plant Sciences.

iDigBio first launched working groups in 2012 to address a deficit in online biodiversity data. Six initial modules sparked an increase in digitization, but evolving digitization and curatorial practices made possible more comprehensive task lists. The latest set of modules is the result of continued collaborations, virtual meetings, visits to many herbaria, iDigBio workshops involving over 50 researchers, and contributions from 15 NSF-funded digitization projects.

"The greatest challenge in producing generic, broadly applicable workflows was determining and presenting a consensus statement of agreed-upon components while preserving maximum flexibility for institutional implementation over a broad array of herbaria," says Nelson.

For Nelson, digitization is the starting point of new avenues to guide biological and ecological research. He envisions huge multi-organismal data sets that will enable researchers to study yet-to-be recognized ecological, biological, and cultural relationships. The work at iDigBio is laying the foundation for a very powerful online resource.

iDigBio provides digitization education and resources to institutions across the United States and is funded by the NSF's Advancing Digitization of Biodiversity Collections program (ADBC).


Nelson, G., P. Sweeney, L. E. Wallace, R. K. Rabeler, D. Allard, H. Brown, J. R. Carter, et al. Digitization workflows for flat sheets and packets of plants, algae, and fungi. Applications in Plant Sciences 3(9): 1500065. doi:10.3732/apps.1500065

Applications in Plant Sciences (APPS) is a monthly, peer-reviewed, open access journal focusing on new tools, technologies, and protocols in all areas of the plant sciences. It is published by the Botanical Society of America, a nonprofit membership society with a mission to promote botany, the field of basic science dealing with the study and inquiry into the form, function, development, diversity, reproduction, evolution, and uses of plants and their interactions within the biosphere. APPS is available as part of BioOne's Open Access collection.

For further information, please contact the APPS staff at

Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of news releases posted to EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.