1 Introduction
A major task for any plant systematist, field ecologist, evolutionary biologist, conservationist, or applied forensic specialist is to determine the correct identification of a plant sample in a rapid, repeatable, and reliable fashion. “DNA barcodes,” i.e., standardized short sequences of DNA between 400 and 800 base pairs long that in theory can be easily isolated and characterized for all species of plant on the planet, were originally conceived to facilitate this task (Hebert et al., 2003). By combining the strengths of molecular genetics, sequencing technologies, and bioinformatics, DNA barcodes offer a quick and accurate means to recognize previously known, described, and named species and to retrieving information about them. This tool also has the potential to speed the discovery of the thousands of plant species yet to be named, especially in tropical biomes (Cowan et al., 2006).
2 The Beginnings of Plant DNA Barcoding
DNA barcodes as universally recoverable segments of DNA for the identification of species was initially designed and applied for animals in the early years of the present century (Hebert et al., 2004b). In contrast a standard DNA barcode for plants was not immediately successful nor accepted by the botanical community until several years later (see Kress, 2011). After an extensive inventory of gene regions in the mitochondrial, plastid, and nuclear genomes (e.g., Chase et al., 2005; Kress et al., 2005; Kress & Erickson, 2007; Lahaye et al., 2008; Newmaster et al., 2008), four primary gene regions (rbcL, matK, trnH-psbA, and ITS) have generally been agreed upon as the standard DNA barcodes of choice in most applications for plants (CBOL Plant Working Group, 2009; China Plant BOL Group, 2011; Li et al., 2015).
The primary use of DNA barcodes is for species identification across the tree of life (Kress & Erickson, 2012). By expanding the ability to diagnose a species of plant during all stages of its life history (i.e., fruits, seeds, seedlings, mature individuals both fertile and sterile) as well as in damaged specimens, and in gut contents and in fecal samples of animals, DNA barcoding has become a universal means of identification. The potential also exists to quantify the consistency of species definitions across lineages of plants with a measure of genetic variability based on the DNA barcode sequence data. As a biodiversity discovery tool, DNA barcoding helps to flag species that are potentially new to science, especially cryptic species (e.g., Hebert et al., 2004a). For the applied users of taxonomy, DNA barcoding serves as a means to identify regulated species, invasive species, and endangered species, and to test the identity and purity of botanical products, such as commercial herbal medicines and dietary supplements. DNA barcodes are now also being used to address ecological, evolutionary, and conservation issues, such as the ecological rules controlling the assembly of species in plant communities (e.g., Kress et al., 2009), the degree of ecological specialization found in plant-animal networks (e.g., Jurado-Rivera et al., 2009), and determining the most evolutionarily diverse habitats for protection (Shapcott et al., 2015).
The process of generating and applying plant DNA barcodes for the purpose of identification entails two basic steps: 1) building the DNA barcode library of known species, and 2) matching the DNA barcode sequence of an unknown sample against the DNA barcode library (Fig. 1). The first step requires taxonomists to select one to several individuals per species to serve as reference samples in the DNA barcode library. Tissue can be obtained from specimens already housed in herbaria or can be taken directly from live specimens in the field with appropriately pressed, labeled, and mounted voucher specimens. These vouchers serve as a critical permanent record that connects the DNA barcode to a particular species of plant. Once the DNA barcode library is complete for the organisms under study, whether they comprise a geographic region, a taxonomic group, or a target assemblage (e.g., medicinal plants, timber trees, etc.), then the DNA barcodes generated for the unidentified samples are compared to the known DNA barcodes using some type of matching algorithm.
Figure 1
Workflow indicating steps involved in plant DNA barcoding. In this example trees are sampled in a tropical forest inventory plot. The workflow starts with tissue samples and vouchered herbarium specimens, and proceeds through generating DNA barcode sequences to build the barcode library for use in taxonomic identification, species discovery, and ecological applications. (from Kress et al., 2012).
Since its initiation in 2003 DNA barcoding as a locus-based endeavor developed in concert with genomics-based investigations (Kress & Erickson, 2008a). DNA barcoding and the field of genomics both share an emphasis on the acquisition of large-scale genetic data that offer new answers to questions previously beyond the reach of more data-limited disciplines. DNA barcodes aim to utilize the information in one or a few gene regions to discriminate among all species of life whereas genomics, the inverse of DNA barcoding, describes in a fewer number of species the function and interactions across many if not all genes. It is expected that eventually, probably sooner than later, these ends of the genetic spectrum will merge together in methodologies and applications (Li et al., 2015; Coissac et al., 2016).
Over the last decade, the application of plant DNA barcodes has accelerated, especially in the fields of ecology, evolution, and conservation. Here I review some of the major breakthroughs and advances in using plant DNA barcodes to investigate specific biological questions. I then conclude with the prospects for building a global plant DNA barcode library and applying new markers and sequencing technologies to construct a better tool for botanical research.
It took nearly five years from the time of publication of the first papers suggesting candidates for plant DNA barcode markers (e.g., Kress et al., 2005) for the botanical community to reach some consensus on the regions that showed the highest promise of success (Lahaye et al, 2008; CBOL Plant Working Group, 2009; Chen et al., 2010; China Plant BOL Group, 2011; Hollingsworth, 2011). It is still not uncommon to see publications testing various markers in specific group of plants (Wang et al., 2017). Yet, even before universal plant markers were accepted systematists, ecologists, evolutionary biologists, and conservationists were already speculating and providing initial tests of the application of plant DNA barcodes to address critical questions in organismal biology (e.g., Kress & Erickson, 2008b; Valentini et al, 2009). In the last five years, the use of plant DNA barcodes has skyrocketed with several reviews of these applications already published (e.g., Hollingsworth et al., 2011; Erickson & Kress, 2012; Pecnikar & Buzan, 2013; Joly et al., 2014; Kress et al., 2014). Categories of use include species level taxonomy, biodiversity inventories, phylogenetic evaluation, biosecurity and public health, conservation assessment and environmental preservation, species interactions and ecological networks, cryptic diversity information, DNA barcoding metadata, ecological forensics, community assembly, traffic in endangered species, and monitoring of commercial products. In some cases, the methodologies are now advanced, while others remain in their infancy.
In this section the many uses of plant DNA barcodes will be summarized in the broad areas of ecology, evolution, and conservation, with a special emphasis on community phylogeny, functional traits and species assembly, species interactions, species boundaries and discovery, DNA barcode forensics, and conservation.
3.1 Community phylogeny and species assembly
DNA barcodes, as a tool, has greatly expanded the collaboration between systematists, who focus on species identification and evolutionary relationships, and ecologists, who investigate species interactions and patterns of associations (Baker et al., 2017). Plant DNA barcoding has been a boon to community ecologists seeking to understand the factors, such as species diversity pools and functional traits, which control the assembly of species into ecological communities (Swenson, 2012). Estimating the third component controlling species assembly, namely evolutionary history, has always been hampered by the lack of well-resolved phylogenetic hypotheses on species relationships in communities: Is there an underlying phylogenetic structure among species in a community? Do closely related species prefer similar habitats and co-occur more or less frequently than expected at random? Phylomatic (Webb & Donoghue, 2005), a tool for estimating phylogenetic trees for plant communities, was a giant step forward for ecologists. However, the publication of the first community phylogeny based on DNA barcode sequence data for the trees in the forest dynamics plot on Barro Colorado Island in Panama (Kress et al., 2009; Fig. 2) set off a storm of new investigations that were able to add a well-supported evolutionary component to understanding species diversity and assembly (e.g., Gonzalez et al., 2010; Kress et al., 2010; Pei et al., 2011; Swenson et al., 2012a; Whitfeld et al., 2012; Kaye M, unpublished data).
A community phylogeny constructed with plant DNA barcode sequence data. Maximum parsimony tree of 281 species of woody plants in the Forest Dynamics Plot on Barro Colorado Island based on a supermatrix analysis of rbcL, matK, and trnH-psbA sequence data. Color highlights indicate orders represented on BCI. The small tree at the bottom of the central column shows just the ordinal relationships among the species in the BCI flora. (from Kress et al., 2009).
Determining if species in a community are more closely related than by chance (phylogenetic clustering), more distantly related than by chance (phylogenetic overdispersion), or randomly distributed across the plant tree of life can now be ascertained by building a DNA barcode library of these species assemblages and generating a phylogenetic tree based on the sequence data. The assumption follows that species in a community that are phylogenetically clustered are more likely to have similar ecological niches (i.e., phylogenetic niche conservation) and have been assembled via abiotic filtering. The contrasting assumption is that phylogenetic overdispersion in a community is the result of biotic interactions among sympatric species. Based on these assumptions the impact of evolutionary history on community structure has been investigated across stages of forest succession (Whitfeld et al., 2012), among habitats within a forest type (Oliveira et al., 2014), among forests across habitat gradients (Swenson et al., 2012a; Mi et al., 2012), and among communities across an entire country (Muscarella et al., 2014) or across the globe (Erickson et al., 2014; Wills et al., 2016). Suddenly ecologists are evolutionary biologists!
The conclusions of these multiple studies in forest communities based on DNA barcode phylogenies have been varied. Phylogenetic signal can suggest the dominance of abiotic filtering in a particular forest habitat (Kaye M, unpublished data) or it can vary across micro-habitats within a given forest (Kress et al., 2009; Pei et al., 2011), during succession (Whitfeld et al., 2012), or across forests at the landscape level (Muscarella et al., 2014) depending primarily on environmental factors (Muscarella et al., 2016). The generation of community phylogenies using DNA barcode data across multiple plots in varied habitats and environments has great promise for further testing the basic assumptions and rules governing species assemblies in plant communities (see Erickson et al., 2014). And it is clear that this approach has yet to reach its full potential (Swenson, 2013).
3.2 Functional traits and species assembly
As described above for investigations of community phylogenetic histories, ecologists have long been interested in quantifying critical plant traits that allow species to function in specific environments, and hence assemble into communities. Measuring the degree of similarity of traits in an assemblage provides insights into those features that allow these species to coexist or not. Quantitative information on functional traits together with well-resolved evolutionary histories give ecologists a powerful tool for understanding the processes of community assembly (Swenson, 2012).
DNA barcodes alone do not provide specific new insights into the role of functional traits in determining plant species assemblages. However, the DNA sequence data provide sufficient signal to derive phylogenetic hypotheses on the role of evolutionary signal in assembling species. It was hoped that the relationship of traits and phylogeny would allow the latter to be a strong predictor in measuring trait similarity across species. Unfortunately the relationship between phylogeny and functional traits is not always a direct correlation thereby preventing phylogenetic signal from being a proxy for ecological similarity (Swenson et al., 2012b; Swenson, 2013).
Nonetheless, since the publication of the first DNA barcode-based community phylogeny of tree species (Kress et al., 2009), a host of investigations have combined data from functional traits with community phylogenies that together have allowed ecologists to explore the processes determining community assembly in temperate, subtropical, and tropical forests. In one of the largest investigations in tropical forests, Baraloto et al. (2012) measured and compared 17 functional traits in 668 species across nine forest plots in the northern Amazon region. Using two DNA barcode markers (rbcL and matK) they found that functional trait similarity was greater than phylogenetic similarity in co-occurring species, and that both factors were significant in determining niche overlap. They concluded that environmental filtering had the strongest impact on determining how tree species are assembled in these tropical communities. Uriarte et al. (2010) reached similar conclusions in an earlier study of eight traits measured across a small cohort of 19 tree species in a forest plot in Puerto Rico. Using a DNA barcode-based community phylogeny they found that at least three traits had significant impact on neighborhood structure even though a somewhat weaker phylogenetic signal was also present. Environmental filtering was concluded to be the major force structuring this community of trees.
The DNA barcode phylogeny generated for the approximately 300 species of trees on Barro Colorado Island in Panama has served as a template for a number of investigations of functional traits. The presence of evolutionary signal in such characteristics as soil associations (Schreeg et al., 2010), leaf toughness (Westbrook et al., 2011), wood nitrogen concentration and life-history strategies (Martin et al., 2014), foliar spectral traits (McManus et al., 2016), and anti-herbivore defense traits (McManus et al., unpublished data) have all utilized the evolutionary information contained in the BCI community phylogeny. In general the patterns of evolutionary signal varied in each of these functional traits across the tree species in the BCI plot. Lasky et al. (2014) also concluded that the association between evolutionary diversity and functional diversity changed through forest succession in tropical landscapes in Costa Rica. Taken together these investigations suggest, as concluded by Swenson (2013), that phylogenetic indictors are not always tied to ecological determinants of community assembly. However, Swenson also noted that both phylogenetic- and trait-based approaches have greatly enhanced the understanding of community assembly and their potential remains significant.
3.3 Species interactions: Identifying unknown partners
In order to fully understand the ecology and evolution of interactions among species in natural and human-altered environments, accurate and repeatable identifications of the interacting partners are imperative. Generalized interactions can be studied to some degree without clear identifications at the species-level of the organisms involved, i.e., only identifying to genus or family. Specialized interactions, including mutualisms and antagonisms, require unambiguous species identifications. The development of DNA barcodes as species-level markers has already begun to revolutionize our understanding of species interactions and the community networks they form, especially in tropical habitats where the most complex interactions have evolved.
One of the earliest applications of plant DNA barcodes to investigate species interactions was employed almost simultaneously in both temperate and tropical ecosystems. The belowground interactions of plants in a community with each other and with microbial communities in soils has been exceptionally problematic to investigate because of difficulty in the identification of plant roots based on morphology alone. However, once a DNA barcode library is developed for a community based on the presence of aboveground representatives, species-specific genetic identification of the belowground roots is facilitated. Kesanakurti et al. (2011) investigated the spatial distribution of root diversity after a DNA barcode library was developed for the flora of an old-field community in southern Ontario, Canada. Using the single DNA barcode marker rbcL, they were able to correctly identify 85% of the root fragments that they sampled in 1 m deep soil profiles and found that the belowground diversity was more highly structured ecologically than the aboveground diversity. With respect to community assembly of these species in this old-filed habitat, both environmental filtering and competitive interactions were determinants of below ground plant distributions.
In a similar investigation in a more floristically diverse lowland tropical forest on Barro Colorado Island in Panama (Jones et al., 2011), the belowground distribution of all trees and lianas greater than 1 cm diameter were mapped using a DNA barcode library already assembled for that flora (Kress et al., 2009). In this study the DNA barcode marker trnH-psbA proved to be quite effective in identifying both fine and small coarse roots taken from 12 soil cores spread across a single hectare of forest. The underground species distributions were then compared with the aboveground distributions of species. In general species interactions and spatial overlap was greater belowground than expected based on aboveground stem densities (Fig. 3). Although this study raised several questions about methodology and analysis, it concluded that the potential for using DNA barcodes was high, which was similar to the conclusion reached in the temperate old-field study (Kesanakurti et al., 2011). Both studies also recognized that the application of next-generation sequencing technologies and metabarcoding will be required to streamline future studies of underground plant interaction (e.g., Hiiesalu et al., 2012).
Figure 3
The distribution of underground roots as determined by plant DNA barcodes. Map from Barro Colorado Island in Panama of the projected distribution of roots of four species in the top 20 cm of soil. The root sampling points at which roots of the focal species were found are indicated with stars, with size scaled to the frequency of the species in proportion mass of samples genotyped. The root sampling points at which no roots of the focal species were found are indicated by open diamonds. The color shows the expected root density of the focal species under a best-fit model, with red indicating the highest value, yellow intermediate, and white lowest. (from Jones et al., 2011).
Food web interactions have been greatly clarified with the application of DNA barcodes. Smith et al. (2011) using the CO1 DNA barcode marker were able to verify the food web structure of the spruce budworm and its numerous parasitoids to understand the population dynamics of this major pest of trees in boreal forests. With regards to plant-herbivore interactions, several teams of ecologists have been able to demonstrate the utility of DNA barcodes to identify the diversity of host plants for herbivorous beetles in both neotropical (Jurado-Rivera et al., 2009; Pinzón-Navarro et al., 2010) and Asian tropical forests (Kishimoto-Yamada et al., 2013). However, these studies used a limited number of molecular markers and were only able to identity the hosts at the generic or familial level.
The most comprehensive analyses between herbivorous beetles and their host plants have been conducted by García-Robledo and colleagues. The host-specific relationships between rolled-leaf beetles in the genera Cephaloleia and Chelobasis (Chrysomelidae) and plants in the order Zingiberales have been well-studied, but the application of DNA barcodes to both the beetles and the hosts have provided a much more detailed and quantitative measure of these interactions (García-Robledo et al., 2013a; Fig. 4). One of the advantages of using a multi-locus DNA barcode is that the beetles can be identified to species at any of their life stages and not only as adults as in most previous investigations (García-Robledo et al., 2013b). Once the basic network of foodweb interactions is established using DNA barcodes, comparisons can be made across habitats, elevations, and temperature gradients. It has been shown in numerous cases (e.g., Hebert et al., 2004a) that DNA barcodes can detect the presence of cryptic species, especially in insects. This power of DNA barcoding has greatly improved the understanding of species boundaries in the rolled-leaf beetles allowing for more precise mapping of the insect-host networks. The detection of these cryptic species clearly demonstrated that the elevational distributions and thermal tolerances of the beetles was much more narrow than previously thought, which will have an impact on the foodweb networks as climate change alters both host and herbivore migrations (García-Robledo et al., 2016).
Figure 4
A plant-herbivore network based on DNA barcodes. Reconstruction of a network using DNA extracted from beetle gut contents. Rectangles represent insect herbivore and host plant species. Lines connect interacting species with colors representing the taxonomic resolution at which each host plant association was identified. Host plant associations were inferred from rbcL and ITS2 DNA fragments. Fragments were compared to host plant DNA barcode libraries containing sequences of all potential hosts in the study area. Total species of insect = 19; total species of plant = 28; total number of interactions = 74. (from García-Robledo et al., 2013a).
This detailed understanding of herbivore-host interactions using DNA barcodes has also been applied to large mammalian herbivores. Kartzinel et al. (2015) were able to determine the extent that sympatric mammalian browsers and grazers in a semiarid African savannah partitioned their diets. Using DNA metabarcoding, they quantified diet breadth, composition, and overlap for seven co-occurring species ranging in size from elephants to dik-diks. Conclusions on competition and coexistence in these habitats based on earlier coarse-grained analyses were shown to be misleading according to the more fine-grained taxonomic data provided by the metabarcoding results. These same types of DNA barcoding protocols have also been adapted to tracking and identifying the vectors of bird-dispersed fruits and seeds in the field (Gonzalez-Varo et al., 2014) in order to build a quantifiable network of frugivores and seed dispersal interactions.
3.4 Species boundaries and biodiversity discovery
Taxonomists have been using morphological features for the identification of both plants and animals since before the time of Carl von Linnaeus. Yet, even after hundreds of years of work by taxonomists perhaps only 20% of the species on earth have been formally recognized and named (Wilson, 2016). Much work remains to be done. DNA barcoding provides a relatively new and significant tool to aid in the determination of species boundaries and discovery of new taxa. Janzen and colleagues (e.g., Hebert et al., 2004a) have been pioneers in incorporating DNA barcode technologies for species discovery in the tropics, where the majority of biodiversity is found, especially in certain insect groups. DNA barcoding is now a standard in their suite of tools being used for a broad-scale inventory of the caterpillars, their food plants, and their parasitoids in Guanacaste, Costa Rica (Janzen et al., 2009). The discovery and delimitation of cryptic species in other groups of insects, such as beetles, is expanding our knowledge of tropical diversity and species interactions (e.g., García-Robledo et al., 2016, 2013b; see above).
Botanists have also applied DNA barcodes to species inventories even though the discriminatory power of the barcode markers for plants is less than the barcode markers for insects. Early studies (Gonzalez et al., 2009; Kress et al., 2009; Dexter et al., 2010) mostly focused on trees in tropical forest monitoring plots and demonstrated the difficulties, especially the low identification rates (e.g., 70%), of using DNA barcodes. The same studies also pointed out the significant gains in being able to more accurately identify sterile and juvenile specimens lacking traditional morphological features required for identification. Costion et al. (2011) applied a three-locus DNA barcode (rbcL, matK, and trnH-psbA) to estimate tree species diversity in a taxonomically poorly known tropical rain forest plot in Queensland, Australia. They concluded that DNA barcodes were a significant aid in rapid biodiversity assessment and determination of cryptic tree populations, even if they were not able to discriminate among all species in the plot. A similar study in a central African rain forest plot using the same DNA barcode markers recognized the high discriminatory power at the genus-level (95%–100%), but somewhat lower species-level success (71%–88%) in identification, especially in species-rich clades. A DNA barcode library of the local species in these plots, including multiple accessions of each species, greatly improved the successful identification at all taxonomic levels.
One of the major issues faced by plant taxonomists and ecologists attempting to use DNA barcodes in hyper-diverse tropical forests is that many species are new to science, therefore lack Latin binomials, and/or are members of poorly circumscribed species complexes that are difficult to identify even with traditional morphological data. Forest inventory plots that have been set-up by ecologists to study forest dynamics of trees over time along elevational, latitudinal, or habitat gradients are riddled with “morphotypes” lacking verified scientific names. Keeping track of these morphotypes and comparing them among plots as well as comparing them to known species is often difficult and prone to error (Gomes et al., 2013), but can be greatly enhanced by building DNA barcode libraries of these taxa (Dick & Webb, 2012; Fig. 5A). The critical role in species identification and discovery played by herbarium voucher specimens, even if lacking flowers or fruits, and the field data associated with these collections cannot be overemphasized (Baker et al., 2017). Forest inventory plots in which trees are tagged for long-term monitoring allow taxonomists to resample and collect additional data from these individuals in the future if necessary. Standardizing the DNA barcode markers and bioinformatics tools being used in different forest inventory projects (e.g., RAINFOR [http://www.rainfor.org/], the Amazon Tree Diversity Network [ter Steege et al., 2013; Fig. 5B], CForBio [http://www.cfbiodiv.org/], and ForestGEO [Anderson-Teixeira et al., 2015]), will facilitate species discovery and taxonomic consistency across broad-scale geographic zones (Dick & Webb, 2012). So far, such standardizations have not been fully adopted.
Species discovery in forest dynamics and inventory plots. A, Summary of the workflow using plant DNA barcodes for species discovery (adapted from Dick and Webb, 2012). B, A map of Amazonia showing the location of the 1430 Amazon Tree Diversity Network plots. Orange circles indicate plots on terra firme; blue squares, plots on seasonally or permanently flooded terrain; yellow triangles, plots on white-sand podzols; gray circles, plots only used for tree density calculations. CA, central Amazonia; EA, eastern Amazonia; GS, Guyana Shield; SA, southern Amazonia; WAN, northern part of western Amazonia; WAS, southern part of western Amazonia. (from ter Steege et al., 2013).
A recent example of how DNA barcodes could play a decisive role in assisting taxonomic clarity is in the tree flora of the Amazon Basin of South America. ter Steege et al. (2013) assembled a massive data set on the distribution and abundance of trees from forest inventory plots across Amazonia based on traditional taxonomic concepts and identifications and concluded that only 1.4% (227 species) of the total estimated 16 000 tree species accounts for 50% of individual trees in the Amazon. These “hyperdominant” species in general have wide distributions across the region. The authors acknowledged that problems in their dataset with taxonomic identification of trees are widespread and lamented that the 5800 species of the rarest trees may never be properly identified, discovered, nor described because of the lack of specimens with diagnostic flowers and fruits. DNA barcodes could provide a powerful tool to overcome these hurdles. We generated DNA barcodes for several of these hyperdominant species for which we had tissue samples from across their ranges and found that some formed well-supported clusters of samples within a genus indicating consistent identification by taxonomists. In other species samples were not clustered within the genus suggesting that the outliers were either misidentified by taxonomists (often from sterile specimens) or that the species as circumscribed is not monophyletic and cryptic species may be present in these plots (Kress WJ et al., unpublished data). Therefore, the overall conclusions of ter Steege et al. (2013) on hyperdominance in the Amazonian tree flora may be in need of further study (see ter Steege et al., 2016). More widespread application of DNA barcodes in taxonomic investigations of tropical trees will provide more confidence in identifications and maybe even allow rapid discovery and description of unknown taxa in these species-rich forests.
3.5 DNA barcode forensics: Commercial products, endangered species, herbal supplements, and ethnobotany
The correct identification of plants and animals is equally important in the non-scientific, commercial world as it is to ecologists and taxonomists. Broadly termed “DNA barcode forensics,” genetic markers are being employed to insure commercial product identity and purity, to protect endangered species in illegal trading, and to document the use of forest plants by local people. For example, the use of DNA barcodes in determining species responsible for bird-strikes of commercial aircraft is now routine (Dove et al., 2008). More widespread is the utilization of these markers in the authentication of animal and other wild-collected commercial products sold in markets around the world (e.g., seafood: Nicole et al., 2012).
The desire for an accurate, reliable, and inexpensive tool for the identification of illegal timber products has been one of the driving forces in recent applications of DNA barcode technologies in several diverse regions of the world. Muellner et al. (2011) tested a number of possible DNA barcode markers for the identification of species of trees in the commercially important mahogany family (Meliaceae). Although most markers fell short of expectations for discriminating species, ITS was able to identify some species of this family that are listed in the Convention on International Trade of Endangered Species (CITES). A higher level of discrimination was demonstrated among commercially important, but threatened species of trees of the tropical dry forests of India. Nithaniyal et al. (2014) used the standard plant DNA barcode markers to accurately identify wood samples collected at timber processing plants in Andhra Pradesh and Tamil Nadu. This same success was demonstrated in timber species found in Araucaria rain forests of the southern Atlantic coast of Brazil (Bolson et al., 2015), which contains many threatened species of trees with high commercial importance, especially in the family Lauraceae. Most recently, DNA barcoding was employed to monitor illegal timber trade in the biodiversity hotspot of Madagascar, where species of Dalbergia (Fabaceae), the rosewoods, are under threat. The limitations of the standard genetic markers in identifying closely related species was discouraging in this genus although some success was achieved (Hassold et al., 2016). Nonetheless regulators are in general optimistic that DNA barcode tools will be of assistance in recognizing species currently protected by government legislation, but under threat from illegal timber operations. In addition to timber trees, DNA barcode libraries have been developed for other taxonomic groups of threatened and endangered taxa listed in CITES, e.g., orchids (Lahaye et al., 2008) and it is expected that this technology will eventually become standard in the monitoring of illegal trade.
Timber is not the only commercial plant product in need of accurate species identifications by regulators and quality control specialists. Traditional medicines, teas, and herbal supplements together are an important and large component of the commercial market in biodiversity, locally, nationally, and internationally. It is estimated that medicinal plants account for over US$60 billion in annual revenues in the United States (see Newmaster et al., 2013, for a review of statistics on markets and use). From the early development of plant DNA barcodes, applications to monitor this market have been in development. Many of these investigations in which DNA barcodes have been applied to commercial medicinals and herbal supplements have concluded that in some cases the genetic markers used, which varied quite widely among studies, were not able to discriminate among species. However, more often the major obstacles were 1) the lack of comprehensive DNA barcode libraries required to make accurate comparisons of constituents of herbal teas and supplements and 2) the lack of standardized, accurate taxonomy and common names listed in the herbal literature, catalogs, and pharmacopeias. Stoeckle et al. (2011) could not identify many of the constituents in the herbal teas they tested using the standard markers rbcL and matK, but more problematic was the lack of comparable sequence data at that time for many of the plants found in the commercial products. The basic taxonomic problem of obsolete or dated nomenclature in the literature on traditional medicines, rather than species discrimination, was a major hurdle in a study of the local trade in medicinal roots in Northern Africa using plant DNA barcoding (de Boer et al., 2014).
Even if herbal products may be pure and reliable as to species identification when locally collected, the final products used by consumers are often mislabeled or are adulterated with additional plant species. In a seminal study of the authenticity of herbal preparations in the United States, it was demonstrated using DNA barcodes that substitutes for black cohosh (Actaea racemosa; Ranunculaceae), a common herb used by post-menopausal women as a substitute for hormone replacement therapy, were present in only nine of 36 commercially available products that listed this species as a constituent (Baker et al., 2012). In a follow-up to this study, a comprehensive investigation of the authenticity of herbal supplements and their contamination was conducted by Newmaster et al. (2013). They build a well-documented DNA barcode library (rbcL and ITS2) of the top 42 plant species used in the commercial trade in herbal supplements and then carefully analyzed the constituents in 44 herbal products available on the market (Fig. 6). Their results, that 59% of the products contained plant species not listed on the labels (many of them “fillers”), not only aroused attention in the scientific world, but made national news (see A. O'Connor, “Pills that aren't what they seem,” New York Times, Tuesday 5 November 2013) and resulted in a backlash from the herbal supplement community (Gafner et al., 2013). Newmaster et al. (2013) concluded by recommending that the commercial herbal industry should routinely use DNA barcoding as a verification of the authenticity of constituents in all herbal products.
Figure 6
The application of DNA barcodes to test the purity of herbal supplements and medicinals. DNA barcode results from blind testing of the 44 herbal products representing 30 medicinal species of plants. (from Newmaster et al., 2013).
One arena that is only now receiving sufficient attention is the use of plant DNA barcodes in the documentation of traditional ethnobotanical knowledge of Indigenous people. A multifaceted project currently underway in the Sierra Nororiental de Puebla, Mexico (Amith J & Kress WJ, unpublished data) aims to combine local ethnobotanical knowledge, linguistics, and cultural history, with DNA barcode documentation of the regional flora to facilitate an understanding of traditional ecological knowledge of the Nahuat and Totonac communities. Anthropologists and ethnobotanists have been documenting such knowledge for centuries. The inclusion of DNA barcoding technologies in this type of work allows the construction of a botanical reference library that will greatly facilitate the collection and accurate identification of the local flora and will demonstrate how plants are named, classified, and used by Indigenous people.
3.6 Species and habitat conservation
One of the major challenges facing biologists today is conserving biodiversity under severe threat due to major habitat degradation and environmental change caused by humans. DNA barcoding, as a tool primarily for species identification, can be used in two specific ways to address biodiversity conservation: 1) as a means of more accurate and eventually more rapid biodiversity monitoring both before and after conservation actions, and 2) by providing data that will assist in estimations of phylogenetic diversity for setting conservation priorities (Krishnamurthy & Francis, 2012).
Making accurate taxonomic determinations for conservation monitoring can be greatly aided with plant DNA barcodes, especially in tropical biomes where biodiversity is poorly known and many species lack verified scientific names. As pointed out above with respect to herbal supplements and medicines, the deficiency of uniform taxonomy is a significant problem in assessing species diversity and identification in local market products. The same applies to poorly known tropical forests requiring conservation in which species identification is extremely difficult, especially when using non-fertile specimens often only labeled as “morphospecies” (Gomes et al., 2013). In such cases DNA barcoding offers a solution for more uniform identifications, although some logistical hurdles may still impede the widespread use of DNA barcodes in this fashion (Gonzalez et al., 2009).
With regard to determining conservation priorities, it has been demonstrated that plant DNA barcodes can play a key role in estimating species richness in the relatively poorly known northern tropical forests of Queensland, Australia (Costion et al., 2011). More recently the fragmented rain forest habitats in South Eastern Queensland, whose distributions and extent reflect both past climate change as well as recent agricultural use, have received renewed conservation attention. These forests are taxonomically rich at the generic-level and less so at the species-level, so that species richness may not be the most appropriate measure for setting conservation priorities. Shapcott et al. (2015, 2017) generated plant DNA barcodes (rbcL, matK, and trnH-psbA) for 770 species in 111 families that accounted for 86% of the rain forest flora in South Eastern Queensland and calculated phylogenetic diversity (PD; see Faith, 2008) measures for each of the 18 subregions in the area (Fig. 7). They concluded that PD was correlated with species richness across the subregions and used these estimates to prioritize subregions for conservation action. It was also determined, using the phylogenetic measures provided by the DNA barcode sequence data, that the local floristic patterns were consistent with both ancient ecological refugia (phylogenetically overdispersed species) and recent lineage range expansions (phylogenetically clustered species) that explained the conservation priorities (Howard et al., 2016).
Figure 7
Using DNA barcodes to map phylogenetic diversity for habitat conservation. Graphical representation derived from the phylogenetic tree for SE Queensland based on three DNA barcode markers indicating by colored bars the species present in each of the subregions. (from Shapcott et al., 2015).
Even though the earth may be undergoing its sixth major extinction with extinction rates over 1000 times normal (Wilson, 2016), observing a species extinction event is rare. Plant DNA barcodes were used to verify that a narrow range endemic tree in the family Rubiaceae, known only from two mature individuals on the island of Palau in Micronesia, was most likely a distinct species in the genus Timonius (Costion et al., 2016). Additional morphological and molecular data verified that this taxon was T. salsedoi Fosberg & Sachet described in the 1980s. In 2014 after a survey of the island where these two individuals were known to occur it was discovered that both trees had succumbed when a typhoon hit the area. Previously recommended as Critically Endangered by IUCN criteria, it is now suspected that this species is extinct (Costion et al., 2016).
DNA barcodes are only in their infancy as applications for understanding and enhancing conservation efforts. However, published studies to date suggest that standardized and comparable genetic information for species across broad geographic regions, such as sequence data provided by DNA barcodes, are a powerful tool and can have a significant impact on basic research (e.g., Mi et al., 2012; Erickson et al., 2014; Pei et al., 2015) as well as conservation monitoring and priority assessments in threatened habitats, in local communities and across large geographic regions (e.g., Shapcott et al., 2015).
4 Tomorrow's Outlook for Plant DNA Barcoding
Since the time of their introduction into the botanical community over a decade ago DNA barcodes have been applied to a variety of investigations in both basic and applied research in plants. One of the main reasons that plant systematists have not yet universally accepted DNA barcoding as a core tool in their arsenal for identifying species is that no single marker is able to completely discriminate among species in most taxonomic groups. In contrast ecologists have been more willing to find new and unique applications of DNA barcodes to address some of their basic research questions because in general they work in systems made up of multiple lineages of plants that can be uniquely identified by a combination of DNA barcode loci. Looking to the future, plant DNA barcoding will advance in two key ways to serve the botanical community by: 1) building a more comprehensive global plant DNA barcode library for universal use, and 2) developing new markers and adopting new sequencing technologies.
4.1 Building the global plant DNA barcode library
When the first well-supported community phylogeny was constructed using plant DNA barcodes for the 296 species of trees in the 50 hectare forest dynamics plot on Barro Colorado Island in Panama (Kress et al., 2009), a light bulb went off in the minds of every community ecologist working in long-term forest monitoring plots around the world. Soon trees in plots across the globe were being DNA barcoded from the neotropics (Gonzalez et al., 2009; Kress et al., 2010) to Africa (Parmentier et al., 2013) to Asia (Pei et al., 2011; Huang et al., 2015). Eventually DNA barcode sequence data (rbcL, matK, and trnH-psbA) were generated and compared across 15 forest plots in the CTFS/ForestGEO network representing 1347 species of trees in both temperate and tropical habitats in seven different countries (Kress et al., 2012; Erickson et al., 2014). DNA barcodes have also been generated for many additional plots that are not part of this particular network. The CTFS/ForestGEO is emphasized here because it represents one of the largest network of long-term forest monitoring plots that is implementing DNA barcoding as a standard protocol over more than 60 plots in 24 countries world-wide (Anderson-Teixeira et al., 2015). To date three-locus DNA barcodes have been generated for over 3000 species of plants in 28 plots; a complete DNA barcode library for all plots will include over 10 000 species of trees and probably two to three times as many lianas, shrubs, and herbs.
Populating the global plant DNA barcode library is one of the biggest challenges for the next decade. These forest monitoring plots represent a rich resource for building the plant DNA barcode library because in general they have well-verified identifications, vouchered collections, and individually tagged trees that can be re-visited by botanists if necessary. Two additional avenues for developing the library for plants include lineage-based efforts and floristic efforts. Individual taxonomists are also generating DNA barcodes for specific groups of plants as either trials for sequencing success using the standard markers (e.g., Chen et al., 2015, 2010; Wang et al., 2017) or as part of their basic molecular phylogenetic investigations in which the DNA barcode markers are used for understanding evolutionary relationships. Although many of these “DNA barcodes” may not receive the official GenBank DNA barcode designation, they are all adding to the library of sequences that complement the standard DNA barcode markers.
Recently major efforts have begun to generate DNA barcodes for entire regional floras. One of the most impressive is the library that has been built for identifying the vascular plants of Canada (Braukmann et al., 2017). Braukmann and colleagues successfully generated barcode sequence records for 96% of the 5108 species known from Canada. Each of the three markers they used (rbcL, matK, and ITS2) varied in its success of coverage across the species pool. Their results indicated that these markers were highly successful in identifying plants at the level of genus across the region and demonstrated best species discrimination in subregions of the highest floristic diversity. Other efforts to build floristic DNA barcode libraries are being conducted usually at the state or regional level and primarily in the temperate zone (e.g., Wisconsin, USA, Givnish T, pers. comm.).
The biggest hurdle in this approach to populating the global DNA barcode library is identifying funding resources to cover the sequencing and laboratory costs, which most often come from government funding agencies. Increasing interest is being shown by government bureaus that are responsible for regulating the transport of biological materials (e.g., the US Department of Agriculture) and crime investigation (e.g., the US Federal Bureau of Investigation). Achieving the goal of providing a universal library of DNA barcodes for all species of plants in the world is still far in the future, but once available, both basic and applied research will benefit greatly.
4.2 New DNA markers and new sequencing technologies
Speculation and predictions on the future direction of plant DNA barcoding began almost simultaneously with the initiation of studies applying these markers to questions in taxonomy, evolution, and ecology, including the relationship between locus-based DNA barcodes and genomic approaches to species identification (Kress & Erickson, 2008a). The need for both advanced sequencing technologies as well as efficient database design and search strategies for species identification were recognized.
One exciting modification of DNA barcoding is appropriately called “metabarcoding” or “eDNA” (Taberlet et al., 2012), which employs genetic markers for the identification of organisms in environmental samples, such as soil, sea water, or coral reefs (Leray & Knowlton, 2015). Successful identification of organisms in these environments usually requires very short and unique genetic markers (often not the standard DNA barcode sequence regions) or “mini-barcodes,” which use a sub-region of the standardized markers, for overcoming the problem of degraded DNA in these samples (Hajibabaei & McKenna, 2012). The same techniques have also been applied to studies using ancient DNA (Willerslev et al., 2007). However, it is also possible to use some of the standard plant DNA barcode markers (e.g., rbcL and ITS2) to determine the composition of plant species in a community by analyzing soil samples (Jones et al., 2011; Fahner et al., 2016). The field of metabarcoding is rapidly developing through improvements in methodology, such as the recovery, amplifying, and sequencing of short DNA fragments. In addition creating new bioinformatics tools for transforming a list of DNA sequences present in a sample into a list of identifiable species is formidable, but will eventually be adequately addressed.
The combination, complementation, and extension of employing the standard single- or multi-locus DNA barcodes with next generation sequencing (NGS) technologies has been inevitable. The divide between specimen-based DNA barcoding and environment-based metabarcoding as described above has been in part responsible for this turn to NGS. It has been suggested that genome skimming (i.e., low-coverage shotgun sequencing) of both plastid and nuclear regions as an “extended DNA barcode” may serve as the bridge between standard DNA barcoding and whole genome sequences as the ultimate in species identification (Coissac et al., 2016; Hollingsworth et al., 2016; Fig. 8). Such “mega-barcoding” will not only circumvent the need for PCR, but will also provide an increased level of genetic data that can serve other purposes besides species identification (e.g., phylogenetic resolution).
Figure 8
Plant DNA barcoding moves towards genomics. Overview of the experimental procedures for implementing extended DNA barcoding based on one gigabase of sequence reads produced by shotgun sequencing of genomic DNA. (adapted from Coissac et al., 2016).
However, for plants some researchers are also advocating a focus on chloroplast genome sequencing as “super-barcodes” to eventually replace the locus-based approach. Li et al. (2015) provide a thorough review of locus-based developments and suggest a new approach to plant DNA barcoding that combines these super-barcodes with the design and selection of “specific-barcode” loci for individual species groups. They term this as the “1 + 1 Model” for plant DNA barcoding. However, they also recognize that this method, even if it will provide a reliable barcode for accurate plant identification, “is not yet resource-effective and does not yet offer the speed of analysis provided by single locus barcodes to unspecialized laboratory facilities.” Indeed, their model may not be plant DNA barcoding at all as originally envisioned, as it offers a very idiosyncratic methodology and not a rapid and universal approach for species identification.
The implementation of other sequencing technologies, such as the utilization of microfluidic PCR-based target enrichment that may offer a faster and less expensive option for large-scale multi-locus plant DNA barcoding (Gostel M, pers. comm.), are indicative of the current state of innovations in genomics. Many of these methodologies are still in their infancy and may yet prove to advance our ability to apply genetic markers to fulfill the goals of DNA barcoding. However, as we seek new methods we must not lose sight of the original purpose of DNA barcoding, namely species identification! In plants, as many have pointed out, universal species discrimination may never be possible with a locus-based approach; neither plastid data alone nor even with a significant amount of information from the nuclear genome will suffice. However, there will always be a tradeoff between the ability to provide absolute universal species discrimination and the level of effort and cost to achieve that goal. As taxonomists, ecologists, and applied scientists we must ask ourselves if 70–90% species discrimination with standard DNA barcoding methods is sufficient if the cost is only 10% of the cost of whole genome sequencing. Are the current technologies adequate and appropriate for most goals envisioned for DNA barcoding? Maybe they are. However, as technological advances rapidly decrease costs and increase efficiencies, maybe they will not be. The near future will provide a quick and final answer.
I would like to thank the many co-authors, collaborators, post-docs, technicians, and interns who I have had the pleasure of working with to advance the field of DNA barcoding. I am especially indebted to those colleagues who have provided inspiration, encouragement, advice, and assistance along the way, including Stuart Davies, Dave Erickson, Paul Hebert, Peter Hollingsworth, Dan Janzen, Kristen Lehman, De-Zhu Li, Ida Lopez, Scott Miller, Nancai Pei, Carlos García-Robledo, David Schindel, Alison Shapcott, Nate Swenson, and Joe Wright. I have no conflicts of interest in the publication of this work.
Sem comentários:
Enviar um comentário
1) Identifique-se com o seu verdadeiro nome e sem abreviaturas.
2) Seja respeitoso e cordial, ainda que crítico.
3) São bem-vindas objecções, correcções factuais, contra-exemplos e discordâncias.