The history and impact of digitization and digital data mobilization on biodiversity research
Abstract
The first two decades of the twenty-first century have seen a rapid rise in the mobilization of digital biodiversity data. This has thrust natural history museums into the forefront of biodiversity research, underscoring their central role in the modern scientific enterprise. The advent of mobilization initiatives such as the United States National Science Foundation's Advancing Digitization of Biodiversity Collections (ADBC), Australia's Atlas of Living Australia (ALA), Mexico's National Commission for the Knowledge and Use of Biodiversity (CONABIO), Brazil's Centro de Referência em Informação (CRIA) and China's National Specimen Information Infrastructure (NSII) has led to a rapid rise in data aggregators and an exponential increase in digital data for scientific research and arguably provide the best evidence of where species live. The international Global Biodiversity Information Facility (GBIF) now serves about 131 million museum specimen records, and Integrated Digitized Biocollections (iDigBio) in the USA has amassed more than 115 million. These resources expose collections to a wider audience of researchers, provide the best biodiversity data in the modern era outside of nature itself and ensure the primacy of specimen-based research. Here, we provide a brief history of worldwide data mobilization, their impact on biodiversity research, challenges for ensuring data quality, their contribution to scientific publications and evidence of the rising profiles of natural history collections.
This article is part of the theme issue ‘Biological collections for understanding biodiversity in the Anthropocene’.
1. Introduction
Rapid technological advances during the Anthropocene have precipitated massive impacts on biodiversity as well as how biodiversity science is conducted. Because major shifts in the Earth's stratigraphy are primarily geological rather than cultural features, the Anthropocene Epoch can only be properly defined by how its stratigraphic signature differs from that of its immediate predecessor, the Late Holocene. Nevertheless, unlike earlier epochs, the Anthropocene is often characterized by non-geological but usually parallel human impacts, including the impacts on biodiversity that have resulted from massive increases in human population and the reciprocal impacts on humans, themselves. Potentially deleterious environmental impacts coupled with the rise and influence of digital technologies brought on by the Anthropocene have increased the urgency and tools for using museum specimens to enhance our understanding of biodiverse systems.
The previous two decades have seen exponential growth in the aggregation and availability of digital biodiversity data for use in research, conservation, outreach and integrated studies across all domains of the biodiversity sciences [1–8]. This has thrust natural history museums and academic collections—especially the biodiversity specimens they curate—into the forefront of biodiversity research in systematics [9], ecology and conservation, underscoring their central role in the modern scientific enterprise and making them more visible, accessible and transparent to citizen scientists and the general public. The advent of such digitization and data mobilization initiatives as the United States (US) National Science Foundation's Advancing the Digitization of Biodiversity Collections (ADBC) programme, Australia's Atlas of Living Australia (ALA), Mexico's Comisión Nacional Para el Conocimiento y Uso de la Biodiversidad (CONABIO), Brazil's Centro de Referência em Informação (CRIA), Europe's emerging Distributed System of Scientific Collections (DiSSCo) and China's National Specimen Information Infrastructure (NSII) has led to a rapid rise in regional, national and international digital data aggregators, and precipitated an exponential increase in the availability of digital data for scientific research. These digital resources raise the profiles of museums, expose collections to a wider audience of systematics and conservation researchers, provide the best biodiversity data outside of nature itself [10], ensure that natural history museums remain at the forefront of biodiversity science and open opportunities for addressing a litany of grand challenge questions [11].
Here, we provide a brief accounting of worldwide digital data generation and mobilization initiatives, the impact of these data on biodiversity research, challenges for improving and ensuring the quality of these data, new data underscoring the impact of worldwide digitization initiatives on scientific publications and evidence of the roles these activities play in raising the public and scientific profiles of natural history collections. Our primary focus is digitized museum specimens with only brief mention of ecological research data deposited in research repositories, expertly vetted range maps, satellite vegetation data or non-vouchered species observational data.
2. A history of digitization
Beginning with a 1999 recommendation of the Biodiversity Informatics Subgroup of the Organization for Economic Cooperation and Development's Megascience Forum, the Global Biodiversity Informatics Facility (GBIF) was founded to enable access to the vast quantities of biodiversity information to advance scientific research and increase knowledge of the natural world [12]. By mid-2018, GBIF was serving more than one billion biodiversity occurrence records, nearly 150 million (or 15%) of which were based on preserved specimens held in natural history collections. Concomitant with the establishment of GBIF and based on a recommendation of the Council of Heads of Australasian Herbaria (CHAH), the Australian Virtual Herbarium was created in 2001, the success of which led to funding for the ALA, a much broader initiative with the mission of transforming Australia's biodiversity knowledge into digital format for enabling collaboration in biodiversity research [13]. In the last decade, the ALA database has grown to over 73 million occurrence records, about 12.6 million (17.3%) of which represent preserved specimens.
Several countries in South America are also aggregating biodiversity data. Beginning in 2002, Brazil's CRIA launched the speciesLink [14] network with the goal of integrating species and specimen data available in natural history museums, herbaria and culture collections, and making these data openly and freely available on the Internet, along with tools to promote interoperability, integration, visualization and data cleaning [15,16]. As of January 2018, speciesLink served nearly 9 million records, about half of which are georeferenced, and at least some of which are also being served by GBIF as well as leading aggregators in North America. In 2010, the Brazilian government also launched ReFlora with the purpose of making available information on Brazilian plant specimens held in overseas herbaria. These data sources have become an important contributor to Brazilian conservation [17,18].
In November 2017, Mexico celebrated 25 years of its CONABIO, established in 1992 to promote, coordinate, support and carry out activities aimed at biodiversity knowledge, conservation and sustainability [19]. CONABIO is now serving nearly 6 million records through its World Biodiversity Information Network (REMIB), a large proportion of which are specimen records from natural history collections [20].
Asia, too, has moved forward with digitization and data mobilization activities. China's NSII is one of 28 initiatives funded by the country's Ministry of Science and Technology within the National Science and Technology infrastructure. NSII is designed to marshal data for use in conservation and the protection of China's biodiversity and serves as the GBIF node for China [21,22].
In Europe, the recently submitted proposal DiSSCo involves 21 European countries and 114 natural history museums with the stated mission of mobilizing, unifying and delivering ‘bio- and geo-diversity information at the scale, form and precision required by scientific communities; transforming a fragmented landscape into a coherent and responsive research infrastructure’ [23]. The project is centred at Naturalis Biodiversity Center, Leiden, The Netherlands and active work on the project is underway. If fully funded, the DiSSCo implementation timeline calls for deployment by 2024 [23].
Biodiversity specimen data digitization, mobilization and aggregation in the USA have been encouraged largely by the launch in 2011 of the US National Science Foundation's ADBC programme, its national resource, Integrated Digitized Biocollections (iDigBio) [24,25] and the several associated Thematic Collections Networks (TCN) [26], whose roles include generating and aggregating to iDigBio a wealth of digitized collections data to address grand challenge questions. To date, ADBC involves 708 collections in nearly 500 institutions representing all 50 of the US states and a majority of collection types [27]. Together, these institutions have contributed over 115 million text records and more than 26 million media records to the iDigBio portal [28]. Given that specimen object records often represent aggregated specimens stored in lots, trays, matrix or by collecting event, the number of individual physical specimens represented in these 115 million records is conservatively estimated at 300–400 million.
Worldwide and in parallel with or in some cases leading up to these national and international efforts, various larger museums with sufficient resources have been digitizing collections for at least two decades, serving data through institutional websites, with many now contributing data to leading aggregators. Examples from Europe include the Paris Herbarium, currently with about 5.4 million specimens digitized [29]; Natural History Museum, London, currently serving about 8.9 million specimen records [30–32]; Naturalis Biodiversity Center, The Netherlands, curating about 37 million objects, of which about 4 million have been digitized [33]; and Museum für Naturkunde in Berlin, with a major focus on whole-drawer digitization of insect trays [34]. The Global Plants Initiative [35,36], focused on making available type specimens of plants, served as an important global leader in encouraging digitization. In the USA, the New York Botanical Garden (NYBG) [37], Harvard's Museum of Comparative Zoology (MCZ), the Harvard University Herbaria, the Yale Peabody Museum, Sam Noble Museum at the University of Oklahoma and the Museum of Vertebrate Zoology (MVZ) at University of California, Berkeley, the latter of which computerized its specimen data in the late 1970s to early 1980s and made them available online in 1997, were among the earliest digitizing institutions. MVZ was also a leader in the establishment of VertNet [38,39], a combination of several discipline-specific sub-projects and an early leader in the development of workflows, data quality protocols and label digitization standards. FishNet, now FishNet 2, was an early collaborator with the consortium that launched VertNet and has been a leader in the development of standards and protocols for georeferencing as well as an important aggregator for fish specimen data.
Despite the worldwide increase in digitization activities, there remain important regions that are poorly represented. Perhaps chief among these is Russia, which has large quantities of biodiversity data stored mostly in local databases inaccessible to the Internet [40]. Nevertheless, a number of Russia-based digitization projects have been launched, with the expectation that more will follow [40]. The continent of Africa is also moving forward with digitization under the auspices and encouragement of the South African National Biodiversity Institute (SANBI) and GBIF. Beginning with the development of a mobilization strategy in 2013–2015, SANBI has recently launched The African Biodiversity Challenge to facilitate data mobilization in Rwanda, Ghana, Malawi and Namibia [41]. Biodiversity information for India is being tracked in several national databases [42], but there are still large gaps in availability, especially of specimen-based digital data generated from Indian collections. As of 13 June 2018, the iDigBio portal contained approximately 361 000 records of Indian specimens, nearly all of which are from US and UK institutions. In 2008, India established the India Biodiversity Data Portal, which serves a variety of species, maps and related data, including over 1 million observation records. Given India's history as an important collecting destination for at least three centuries, there is growing interest in digitizing Indian specimens that are held in museums outside as well as inside India [43].
3. Digitization definition and approaches
We define digitization as the conversion of specimen data from analogue to digital signals. This includes transcribing text data from specimen labels and other specimen-related documents into digital records of those labels and documents regardless of input mode (e.g. voice, keyboard, scanning/optical character recognition (OCR)); the translation of physical specimens to digital images of those specimens, including two-dimensional, three-dimensional (3D), computed tomography (CT) and other digital image types that visually represent the physical specimen; the conversion of analogue audio and video recordings to digital recordings; the conversion of textual location descriptions into digital georeferences within an accepted geographical coordinate system and the conversion of other specimen-related data into digital format with technologies that are or might become available. Although in common parlance, some observers use ‘digitize’ to mean imaging and ‘databasing’ to mean text transcription, here we use digitization to encompass both.
Approaches to digitization and the workflows that flow from them vary by institution based on institutional goals, resources, personnel, curator preferences and collection types [44]. Nelson et al. [2] outline five digitization task clusters in common use. These clusters have provided guidance for the development of several workflow documents [45–47] that encompass numerous discipline-specific approaches to digitization protocols.
Embedded within virtually all approaches to digitization is the adherence to data standards that govern the elements to be included in text transcription and multimedia resources metadata. Essentially all biodiversity databases and major aggregators are designed around or provide methods for translation to Darwin Core [48], the most common and complete vocabulary for biodiversity data. Likewise, Audubon Core [49] provides standards for multimedia resources associated with specimens. These standards provide translation to a common language, making possible comparisons across data stores and disciplines.
4. Growth of digitization
The rapid increase in the generation and mobilization of digital data and the attractiveness of these data to biodiversity scientists have been paralleled by an equally rapid and upward trending number of publications using and referencing the output of numerous digitization projects. For example, since the inception of ADBC in 2011, there has been a steady rise in the number of publications that cite use of data and other resources (e.g. geographical coordinates) from the iDigBio aggregation portal, TCN portals or other portals that aggregate TCN data (figure 1). Moreover, while the number of publications authored by those funded by ADBC has been relatively constant, the number of publications by authors external to the ADBC community has shown a dramatic increase (figure 2). We take these increasing numbers as evidence of the value that biodiversity scientists and researchers attribute to the growing accumulation of digital data.
5. Research with digitized specimen data
Expanding availability of digital data is enhancing avenues for current and future research that stretches across the various domains represented in the neo- and palaeobiological sciences. For example, Soltis & Soltis [4] outline several emerging big data tools for analysing the increasingly large biodiversity datasets that are rapidly coming online, and suggest novel research questions these data might address. Research emphases include assessing phylogenetic diversity for conservation [50], large platform tools for integrated geospatial analyses using specimen locality data, advances in ecological niche and species distribution modelling [51–53], and the potential development of new workflows [4]. Losos et al. [11] have suggested how the burgeoning supply of digitized data might be used to address important human issues, including evolutionary medicine, food security, biodiversity sustainability, computation and design, evolution and justice, and the development of new types of biodiversity theories that accommodate newly emerging data streams. Others have addressed emerging research angles, including the supplementation of existing datasets with related digital layers to enhance niche and species distribution modelling [54]; the use of 3D data for generating and testing new hypotheses; the implementation of convolutional neural networks (CNN) and deep learning in the analysis of image data for taxonomic determination [55–57] and specimen curation [58], the delineation of traits in specimen images and the determination and identification to genus or species of sediment-deposited pollen grains [59].
The delineation of traits in specimen images can be especially useful for detecting and relating phenological shifts in the fruiting and flowering times of vascular plants to the dynamics of climate change and the synchronicity of fruit production to wildlife migration (see Deacy et al. [60] for an example of where this could be applied). Phenology has also become an important exemplar for the study and tracking of global change [61–66], especially in the use of digital herbarium records for the study and tracking of phenological shifts in vascular plants [65,66] and fungi [67,68]. New tools and protocols are being advanced for rapid digitization [69] and automated scoring of herbarium sheets [70] and improved crowdsourcing platforms developed [71–74] that can be used to engage public participation in scoring specimen images for phenological stage.
Building on the rich history of using plant specimens to study phenology [66], the Phenology Working Group [75], hosted by iDigBio, has so far conducted one workshop resulting in two papers [66,70], is producing a special issue of Applications in Plant Sciences devoted to phenology and herbarium data, hosted a symposium on phenology and digital data at Botany 2018 and is currently researching the use of CNN in deep learning for mass scoring of specimen images using computer vision techniques. Part of the working group's interest lies in the synchronization of plant phenological stages with food availability to wildlife, an issue that has been demonstrated to influence wildlife behaviour and adaptation [60,76]. The potential for CNN impact on agriculture and food security for humans is also being demonstrated [77] and presents another avenue for promising research in the face of global change.
Within the last 3–5 years, the use of CT scanning has advanced from being applied mostly to fossils to a much wider range of specimens. The recently US National Science Foundation (NSF)-funded openVertebrate (oVert) TCN [78] is an example. oVert is using CT technology to scan 20 000 fluid-preserved vertebrate specimens, representing approximately 80% of living vertebrate genera. These specimens include fluid-preserved birds, reptiles, amphibians, caecilians, fishes and mammals. This collaboration of 18 institutions is the first TCN to provide the international research community with freely accessible digital 3D data for internal anatomy across vertebrate diversity. When applied to research, these types of data facilitate the study of patterns of relationships among living and extinct vertebrates, allow testing of hypotheses related to morphological evolution and adaptation, and promote the exploration of relationships between brain and nervous system anatomy as well as sensory and musculoskeletal function, all of which have the potential for significantly improving the human condition.
CT technology has also been used with Echinoides to explain the strength of lightweight skeletal structures [79]. Discoveries from these studies have provided ‘the potential to improve technical multi-plated, lightweight and load-bearing structures for civil engineering, which make them valuable role models for structural analyses' [79, p. 6]. Such extrapolations from the study of biodiversity to other domains suggest implications of specimen-based research for the development of low-cost housing, food security [77] and medicine.
Other emerging opportunities include the layering of various environmental, ecological, behavioural, audio, visual and well-vetted observational datasets (such as those of the Cornell Laboratory of Ornithology's eBird project [80]) with digital specimen data to facilitate triangulation of multiple data sources as well as richer research methodologies and outcomes. Recent research [81], for example, has combined historical precipitation data with digitized museum records to correlate the well-documented periodic emergences of cicada populations with rain patterns to predict future emergences. Emergence events are clearly documented in specimen collection records, making them an excellent subject for combining these types of datasets.
Vertebrate zoologists are also finding ways to leverage multiple digital datasets. In 2013, NSF funded the Developing a Centralized Archive of Vouchered Animal Communication Signals TCN [82]. This collaboration of seven institutions led by researchers at The Cornell Laboratory of Ornithology is a first step in expanding the scope of specimen-based research as well as broadening the definition of specimen. Macaulay Library [83], an international resource of nearly 6 million photo, audio and video objects (which also include arthropods), promotes the linking of physical vouchers with media records to provide foundational and coordinated datasets for studying the tempo and mode of animal signal evolution. As part of their research, the team is exploring a re-definition of specimen to include an extended suite of data, sometimes in the absence of physical objects [84]. Geospatial, temporal and phylogenetic analyses of digital specimen data have also been used for testing and reconciling controversial tenets and predictions of mimicry theory between coral snakes and other red-black banded snakes [85].
A recently established working group, also led by researchers at Cornell, is exploring methods for efficiently scoring, standardizing, analysing and presenting behavioural and movement data, such as those generated from camera traps, audio recording devices and extensive video studies of phenotypic behaviour. To date, two workshops have been held that combined behavioural scientists with data storage, analysis and aggregation experts (G. Nelson 2016, personal observation). Publications from these workshops are in progress. Behavioural data, such as those used by Brainerd, are resulting in increased understanding of the relationships between anatomy, morphology and biomechanics, including novel applications for assessing the biomechanics of birds in flight [86]. For analysing these types of data, Brainerd et al. [87] at Brown University have developed X-ray Reconstruction of Moving Morphology (XROMM), a 3D imaging technology for visualizing rapid skeletal movement in vivo.
For some museum scientists, the use of non-verifiable observational data is anathema to research that is dependent on physical vouchers, reliable and reproducible species identifications, and procedural replication. However, when well-vetted observations are used to supplement specimen data or foster the collection of new physical or media vouchers to test hypotheses, arguments against augmenting or enriching physical specimen datasets with observational datasets become less compelling. This is especially true in vertebrate zoology, where image or audio data are nearly as good as a specimen in hand for some types of research and is one of the underpinning themes of Webster and colleagues [84,88]. In addition, Peterson and his team [89–91] have combined carefully cleaned specimen and observational data from GBIF, VertNet, REMIB, Unidad de Informática para la Biodiversidad and eBird as well as other vetted sources to study extinctions, range shifts, phenological shifts and breakdown of interactions in ecological communities in the USA and Mexico over several decades.
6. Caveats
Digital data proliferation has revealed challenges as well as opportunities, especially with ensuring that aggregated data reflect the basic definition of quality, meaning that the data are complete, consistent, accurate, fit for use, free of bias [92–95] and adhere to community-embraced standards (e.g. the Darwin Core Standard [48]) [96–98]. The critical need for enhancing data quality has led to procedures, research methods and best practices for improving and confirming accuracy and fitness [97,99], including the combining of GBIF and GenBank data to identify potential identification anomalies in mycology [100], address pressing data quality challenges in entomology [96,101], mining and analysing palaeobiology data [102], discovering research uses for vertebrate trait data [103], reviewing and critiquing the efficacy and potential bias in species distribution models using natural history museum specimen data [52], combining El Niño–Southern Oscillation and 100 years of museum specimen data for the prediction of cicada emergence in Western North America [81] and the use of images to detect new ant host species for a common parasite [104]. Issues with data completeness have been documented in several studies (e.g. [105]), especially where gaps in distribution do not reflect expectations, suggesting under collecting or an equally likely dearth of mobilized records from one or more significant biodiversity collections.
Two major areas of improvement in the quality of digital data include the resolution and correction of taxon names as reported in electronic records of specimen label data [98,99] and the accuracy, resolution and fitness for use of reported geospatial coordinates [97,98]. Chapman [99] highlights three main types of taxon name errors, those of identification, spelling and format. Zermoglio et al. [106] add to this list errors that arise from misunderstanding, misapplication or lack of following the Darwin Core Standard, and highlight the use of out-of-date synonyms as problematic. Several projects have tackled the taxonomy and synonymy issue, but comprehensive solutions are few, with the possible exceptions of ornithology where worldwide recommendations of common names have a long history, and ichthyology, where the Catalog of Fishes [107] serves as the standard for nomenclature and taxonomy. In the long run, successful integration across the universe of digitized specimens, with the ultimate goal of linking specimen records to all of their derivatives (e.g. tissues, traits, genetic sequences and field notes) and commonalities across the Internet, including locality and taxonomic descriptions, temporally and spatially related specimens, directly and indirectly related literature, associated media records (e.g. audio and video recordings as well as still images of a specimen and its collecting site) and a potential host of other related information, is likely to be as dependent on well-ordered and fully documented digital systems for resolving taxonomy and nomenclature as it will be on the effective assignment of globally unique identifiers and semantic tags to specimen records. However, whether there will ever be widely accepted and incontrovertible taxonomies is somewhat conjectural. Taxonomy as a hierarchy of hypotheses is central to biodiversity science and to the scientific enterprise. Varying interpretations are to be expected.
For typical errors with geospatial data, Hill et al. [97] emphasize incomplete coordinates, strings inserted into numeric fields, incorrect coordinate system references, latitude values incorrectly reported for longitude and vice versa, incorrect or omitted numerical signs, misplaced decimals and coordinate values beyond a valid range. Aggregators have implemented tools to filter and correct, or at least suggest corrections for, some of these errors. However, errors in precision based on the quality of the global positioning system device used, georeferencing protocols, transcription errors, rounding and conversions from United States Public Land Survey System references to geographical coordinates can be much more troublesome, especially in studies where highly resolved coordinates are required. Append to these the assignment of coordinates to legacy records post collection, where georeferencers often make assignments from sparse descriptions on labels, and the opportunity for error is apparent.
7. Raising the public profiles of natural history museums and academic collections
Evidence suggests that the broad access to digital data over the last decade has contributed significantly to the public profiles of natural history museums and academic collections, at least in the USA. Reflecting a worldwide trend [7], one of the US National Science Foundation's underpinning goals for establishing the ADBC programme has been to raise the visibility of natural history museums by making them more accessible to school-age children, natural history enthusiasts and the public at large, educating these audiences that museums are not only interpretive organizations with exhibits and displays, but also significant research institutions that foster important discoveries and advances in our understanding and conservation of biodiversity [107]. Using current technologies to make natural history collections remotely accessible to a far wider audience has served to enhance research diversity [108] and elevate collections in ways that have fostered their increasing presence in the popular press made their contributions explicit and transparent. Although we admit that the conclusions we draw regarding raising museum profiles are anecdotal and not founded on extensive surveys or comprehensive and comparative scoring of popular press articles over time, we believe such quantitative investigations to be worthy of future research efforts.
A 2015 story in the New York Times [34] underscored the importance of getting museum data online and a companion article offered a guide to five digital resources that offer access to natural history collections [109]. In February 2017, the Washington Post published a video [110] entitled ‘These three people, and one conveyor belt, are digitizing millions of plant specimens’, highlighting the work being done at the herbarium of the Smithsonian's National Museum of Natural History (NMNH) in Washington, D.C., which houses a collection of about 5 million dried and mounted plant specimens. In September 2017, the Chicago Tribune highlighted the importance of collections in a video entitled ‘Endangered Insects at the Field Museum [111]’. NMNH and the Field are two of the United States' largest and best-known museums.
In 2016, Voice of America News featured digitization efforts underway at the Natural History Museum of Los Angeles County [112], reinforcing the notion that digitizing the huge numbers of specimens in natural history collections will facilitate discovery by making specimen searches and comparisons more efficient and timely. The Canadian Museum of Nature was highlighted in a 2014 CBCNews feature [113] for their programme to digitize 3 million of their 10 million specimens, in what was presumably the museum's first round of digitization activities.
In some instances, the popular press includes citations or links to the original scientific papers that the popular article intends to interpret, such as the paper by ter Steege et al. [114] which was reported on in the Science section of the New York Times on 13 July 2016 [115]. The article highlighted the use of digital records to construct an inventory of Amazonian trees [116]. In times when budgets and support for collections seem to be declining, provocative titles like ‘What can you do with 300,000 dead bees?’, which appeared in the Toronto Star, 25 January 2016 [117] heading an article regarding the importance of the bee collection at the Royal Ontario Museum, make visible and lend transparency to the important science achieved through the maintenance of natural history museums and their specimens.
The elevated profile of natural history museums as holders of biodiversity specimens and the digital data that represent them, in addition to interpretive kiosks and displays, has not been lost on undergraduate students, who themselves become outreach agents [118]. As museums reach out even more aggressively, exposing undergraduates to collections-based research and the incorporation of digital data in biodiversity science, the potential for downstream impacts, including recruitment of a more diverse constituency and a broader range of skill sets, will grow [119,120].
8. Conclusion
The increasing pace of digital specimen data mobilization coupled with the rapid development of tools and protocols for the novel use of these data have placed natural history museums and herbaria at the forefront of biodiversity research, increasing their visibility and undergirding their value to scientists and the general public. Enhanced opportunities for research and data analysis are leading to discoveries across all biodiversity domains as well as informing research in engineering, design, architecture, food security and the medical sciences. The recent expansion of digital data has placed biodiversity collections on the cusp of big data science, opening multiple pathways for natural history museums to make positive contributions to our understanding of and responses to impending global change.
Data accessibility
The data supporting figures 1 and 2 can be accessed at: https://www.idigbio.org/sites/default/files/internal-docs/Supporting%20references%20for%20Nelson%20%26%20Ellis%20%282018%29%20updated.pdf.
Authors' contributions
G.N. made substantial contributions to the design of the manuscript, drafted the original manuscript, revised it as needed and approved the final version. S.E. compiled, formatted and analysed data, critically reviewed and improved the manuscript, and approved the final version. G.N. has collaborated with the guest editor Meineke on symposia and has invited her to contribute a paper for a special edition of Applications in Plant Sciences.
Competing interests
We declare we have no competing interests.
Funding
The authors are wholly or partially funded by US National Science Foundation award DBI 1547229.