Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches

The fungal kingdom is a hyperdiverse group of multicellular eukaryotes with profound impacts on human society and ecosystem function. The challenge of documenting and describing fungal diversity is exacerbated by their typically cryptic nature, their ability to produce seemingly unrelated morphologies from a single individual and their similarity in appearance to distantly related taxa. This multiplicity of hurdles resulted in the early adoption of DNA-based comparisons to study fungal diversity, including linking curated DNA sequence data to expertly identified voucher specimens. DNA-barcoding approaches in fungi were first applied in specimen-based studies for identification and discovery of taxonomic diversity, but are now widely deployed for community characterization based on sequencing of environmental samples. Collectively, fungal barcoding approaches have yielded important advances across biological scales and research applications, from taxonomic, ecological, industrial and health perspectives. A major outstanding issue is the growing problem of ‘sequences without names’ that are somewhat uncoupled from the traditional framework of fungal classification based on morphology and preserved specimens. This review summarizes some of the most significant impacts of fungal barcoding, its limitations, and progress towards the challenge of effective utilization of the exponentially growing volume of data gathered from high-throughput sequencing technologies. This article is part of the themed issue ‘From DNA barcodes to biomes’.


Fungal Barcoding Resources Supplement
Extraction methods for improving yields and efficiency (Dentinger et al. 2009;Osmundson et al. 2013a) have been suggested, and to ensure high quality and representativeness, methodological improvements like using proofreading polymerase , testing primers for bias against certain taxonomic groups (Bellemain et al. 2010;De Beeck et al. 2014;Tedersoo et al. 2015), and even user's guides to 96-well specimen-based (Eberhardt 2012) and high-throughput fungal amplicon sequencing have been published, with step by step suggestions and cautions from sampling and lab methods to analysis and interpretation (Lindahl et al. 2013). Perhaps the most discussion has been paid to choice of primers, since so-called universal primers for rDNA have known mismatches for several groups of Fungi, some of which are abundant and presumably ecologically important (e.g for ITS; Rosling et al. 2011).
Fungal metabarcoding is almost always carried out using part or all of the ITS region, but there is some controversy about which portion and which primers to use to offer the best resolution with the best representativeness (Blaalid et al. 2013). The choice has important implications for species identification and any downstream application of those names . Some workers have suggested coamplification with SSU in order to allow for phylogenetic studies and to anchor so-called orphan ITS sequences with no close analogues in databases (O'Brien et al. 2005;Richards et al. 2012), but as metabarcodes are necessarily limited in length by current sequencing technologies, most arguments are focussed on which part of the ITS should be targeted. If sequence read length is set to continue to increase with new technologies, ITS2 would benefit from the higher resolving power of the downstream LSU compared with ITS1 and its highly-conserved downstream 5.8S ). On the other hand, ITS1 is reported to be more variable than ITS2 for a majority of basidiomycetes from dried collections tested (Osmundson et al. 2013b) and offers slightly better resolving power across a wide range of ascomycetes from sequence databases (Wang et al. 2015). The latter study and others also reported similar species identification success of ITS1 and ITS2 across a wide range of Basidiomycota (Blaalid et al. 2013;Wang et al. 2015). In tests in the EM fungal genus Inocybe , and across lichen fungi (Kelly et al. 2011), ITS1 and ITS2 performed more or less equally well at species discrimination, which is not surprising, since ITS1 and ITS variation tend to be correlated Blaalid et al. 2013).
The choice of PCR primers has important consequences for what sequences are recovered, with some groups severely underrepresented (e.g. Bellemain et al. 2010;Schadt & Rosling 2015), and in some cases yield remarkably low species-level resolution, with only 45% (Pitkäranta et al. 2008) or even fewer than 25% of OTUs identified (Korpelainen et al. 2015) for indoor air fungi. Group I introns in the SSU can also result in non-amplification or overly long amplicons for some lichens (Kelly et al. 2011). Several new sets of primers have been proposed and tested (Toju et al. 2012;De Beeck et al. 2014;Tedersoo et al. 2015). Despite eliminating some taxon bias, there are disadvantages to using newly designed primers, notably the loss of comparability with other studies, and particularly with the difficulty in relative quantification of OTUs.
However, to assay the bias in primers, a test in soil fungi found that shotgun sequencing versus amplicon sequencing revealed little to no bias (Tedersoo et al. 2015). Further similar tests should be completed in other fungal target groups and habitats.
Several Fungal-specific bioinformatics pipelines have also been developed, the best-known of which is UNITE, which includes the PlutoF workbench (Abarenkov et al. 2010) and modules for ITS extraction, chimera checking (including UCHIME (Edgar et al. 2011)) and identification, by matching query sequences with species hypotheses (including varying similarity cut-offs) and reference sequences determined by expert users. The intergrated pipeline PIPITS takes advantage of many of the features of UNITE (Gweon et al. 2015), and another expressly for Illumina data was created to be both flexible and straightforward, having been used successfully by inexperienced students with only a few hours tuition (Seifert et al. 2007). The bioinformatics tools outlined here here are based on MOTU discrimination and similarity thresholds, whereas evolutionaryaware approaches such as phylogenetic and coalescent-based criteria remain marginal and largely restricted to fungal taxonomists. The available fungal metabarcoding workflows are also designed for amplicon sequencing studies, and to our knowledge there are no optimised approaches for phylogenetic profiling of fungi in shotgun metagenomics datasets like there are for prokaryotes (Segata et al. 2012). Figure 2.