New! Sign up for our free email newsletter.
Science News
from research organizations

Bringing 'dark data' into the light: Best practices for digitizing herbarium collections

New workflow modules will facilitate imaging and data transcription for thousands of plant specimens

September 10, 2015
Botanical Society of America
North American herbaria curate approximately 74 million specimens, but only a fraction have been digitized. Imaging specimens and transcribing the related data into online databases can vastly increase available biodiversity data, allowing new discoveries. The National Science Foundation's Integrated Digitized Biocollections is facilitating an effort to unify digitization projects across the country through the development of digitization workflows. The workflows, along with details on their development, are available in a newly published article.

Imagine the scientific discoveries that would result from a searchable online database containing millions of plant, algae, and fungi specimen records. Thanks to a new set of workflow modules to digitize specimen collections currently preserved in herbaria, something like that might be within reach. The modules are provided by the National Science Foundation's (NSF) Integrated Digitized Biocollections (iDigBio), which is facilitating a collective effort to unify digitization projects across the nation.

"North America's herbaria curate approximately 74 million specimens and only a fraction have made it online," says iDigBio's digitization specialist Dr. Gil Nelson. "Having these data available at one's fingertips will enable advanced queries and new discoveries while ensuring inclusion of the so-called 'dark data' that reside in a significant percentage of the United States' more than 600 active herbaria."

According to recent estimates, approximately half of U.S. herbaria and universities have yet to begin mobilizing data. Nelson coordinated the development of the workflows, working alongside 28 other contributing authors, to provide guidance to institutions just beginning digitization programs as well as those seeking to streamline and tweak their current digitization configuration.

The 14 modules, each organized in seven to 36 easy-to-follow and customizable tasks, cover everything from setting up an imaging station to georeferencing. They also include methods to organize outreach events for public participation in imaging and data transcription. They are downloadable as Portable Document Format (PDF) and editable word processing files on GitHub and as PDF files at iDigBio. A full description of the workflows and their development, along with editable word processing files of the workflow modules, is available in the September issue of Applications in Plant Sciences.

iDigBio first launched working groups in 2012 to address a deficit in online biodiversity data. Six initial modules sparked an increase in digitization, but evolving digitization and curatorial practices made possible more comprehensive task lists. The latest set of modules is the result of continued collaborations, virtual meetings, visits to many herbaria, iDigBio workshops involving over 50 researchers, and contributions from 15 NSF-funded digitization projects.

"The greatest challenge in producing generic, broadly applicable workflows was determining and presenting a consensus statement of agreed-upon components while preserving maximum flexibility for institutional implementation over a broad array of herbaria," says Nelson.

For Nelson, digitization is the starting point of new avenues to guide biological and ecological research. He envisions huge multi-organismal data sets that will enable researchers to study yet-to-be recognized ecological, biological, and cultural relationships. The work at iDigBio is laying the foundation for a very powerful online resource.

iDigBio provides digitization education and resources to institutions across the United States and is funded by the NSF's Advancing Digitization of Biodiversity Collections program (ADBC).

Story Source:

Materials provided by Botanical Society of America. Note: Content may be edited for style and length.

Journal Reference:

  1. Gil Nelson, Patrick Sweeney, Lisa E. Wallace, Richard K. Rabeler, Dorothy Allard, Herrick Brown, J. Richard Carter, Michael W. Denslow, Elizabeth R. Ellwood, Charlotte C. Germain-Aubrey, Ed Gilbert, Emily Gillespie, Leslie R. Goertzen, Ben Legler, D. Blaine Marchant, Travis D. Marsico, Ashley B. Morris, Zack Murrell, Mare Nazaire, Chris Neefus, Shanna Oberreiter, Deborah Paul, Brad R. Ruhfel, Thomas Sasek, Joey Shaw, Pamela S. Soltis, Kimberly Watson, Andrea Weeks, Austin R. Mast. Digitization Workflows for Flat Sheets and Packets of Plants, Algae, and Fungi. Applications in Plant Sciences, 2015; 3 (9): 1500065 DOI: 10.3732/apps.1500065

Cite This Page:

Botanical Society of America. "Bringing 'dark data' into the light: Best practices for digitizing herbarium collections." ScienceDaily. ScienceDaily, 10 September 2015. <>.
Botanical Society of America. (2015, September 10). Bringing 'dark data' into the light: Best practices for digitizing herbarium collections. ScienceDaily. Retrieved December 7, 2023 from
Botanical Society of America. "Bringing 'dark data' into the light: Best practices for digitizing herbarium collections." ScienceDaily. (accessed December 7, 2023).

Explore More
from ScienceDaily