Academia.eduAcademia.edu
Nineteenth Australasian Weeds Conference The development of a DNA barcode system for species identification of Conyza spp. (fleabane) 1 Karen Alpen1, David Gopurenko1,2, Hanwen Wu1,2, Brendan J. Lepschi3 and Leslie A. Weston1 Graham Centre for Agricultural Innovation, Locked Bag 588, Wagga Wagga, NSW 2678, Australia 2 NSW Department of Primary Industries, Private Mail Bag, Wagga Wagga, NSW 2650, Australia 3 Australian National Herbarium, Centre for Australian National Biodiversity Research, CSIRO Plant Industry, GPO Box 1600,Canberra, ACT 2601, Australia (dalpen@bigpond.com.au) Summary The genus Conyza Less. includes numerous species of invasive annual weeds that are a threat to the cropping regions of Australia. Conyza species are successful ruderal invaders, tolerating a wide range of climates, soils and habitats. There are eight recorded species of Conyza in Australia, the most prevalent and invasive being C. bonariensis (L.) Cronquist (flax leaf fleabane), C. sumatrensis (Retz.) E.Walker (tall fleabane) and C. canadensis (L.) Cronquist (Canadian fleabane). These species exhibit differences in susceptibility to commonly applied post emergent herbicides, and herbicide resistance has been confirmed in flax leaf fleabane in New South Wales and other eastern states of Australia. Herbicide application is most effective at an early stage of plant growth before flowering but identification of Conyza species using morphological characters at these early stages is often only achievable to the genus level. This study explored the utility of DNA barcoding as a method for accurate and rapid species identification of Conyza species in Australia to facilitate management of these weeds. It assessed the ability of one nuclear (ITS) and three chloroplast gene regions (rbcL, matK, and trnL-trnF) to discriminate between Conyza species from populations collected across Australia, and also from herbarium voucher specimens for each species. The results showed that a combination of ITS and rbcL DNA barcode regions generally provided a suitable platform for potential identification of Conyza at the species level. Keywords Flax leaf fleabane, Canadian fleabane, tall fleabane, Conyza bonariensis, C. canadensis, C. sumatrensis, C. parva, C. bilbaoana, C. aegyptiaca, C. leucantha, C. primulifolia, DNA sequence analysis, DNA barcoding, ITS, rbcL, trnL-trnF, matK. INTRODUCTION DNA barcoding is a method used for genetically identifying taxonomically described species based on sequence content at a standard comparable genomic region (Hebert and Gregory 2005). A single gene region (mitochondrial CO1 gene) has been established as the standard DNA barcode for species identification in animals but the discovery of a similar single gene DNA barcode for plants has proved more challenging. Multiple gene regions are almost always required. The chloroplast gene regions rbcL and matK where recommended by the CBoL Plant Working Group (2009) as a standard two-locus DNA barcode for identification of plants, but for some genera these gene regions are not powerful species discriminators and additional/alternative gene regions must be identified and used. The aim of this study was to explore several gene regions for their potential use as DNA barcodes for identification of Conyza species. MATERIALS AND METHODS DNA sequences were sourced from Conyza specimens held in the Australian National Herbarium, Canberra and the Queensland Herbarium, Brisbane; and previously identified field collected/greenhouse specimens. Sequences were also sourced from publicly available gene sequence databases such as GenBank and Barcode of Life Data systems (BOLD). Where possible, specimens used for sequence analysis were obtained from a wide geographical area in order to capture the depth of genetic variation within each species at the particular DNA region under scrutiny. Eighty nine specimens were obtained, of these, 76 generated suitable sequences for further analysis (C. bonariensis n = 39; C. sumatrensis n = 21; C. canadensis n = 6; C. parva n = 3; C. bilbaoana J.Remy n = 2; C. primulifolia (Lam.) Cuatrec. & Lourteig n = 1; C. aegyptiaca (L.) Aiton n = 2; C. leucantha (D.Don) Ludlow n = 2). DNA extraction, PCR and sequencing Specimen tissue samples (<0.4 cm2) were digested overnight at 55°C in 240 µL of 1% DX digest enzyme/DXT (v/v) digest solution (Qiagen). DNA was extracted using a Corbett Research X-tractor Gene™ (CAS-1820) robot with recommended Qiagen QIAxtractor DNA plasticware and associated DX solid tissue DNA extraction buffers. 401 Nineteenth Australasian Weeds Conference PCR was used to amplify chloroplast (rbcL, matK and trnL-trnF) and nuclear (ITS) gene regions using the forward and reverse primers shown in Table 1. PCR products were visualised for quality and approximate size using a UV transilluminator (BioRad Molecular Imager® Gel Doc™ XR System) after electrophoresis of 2.5 µL of PCR products and reference size markers through a 1.5% agarose gel immersed in 1% TAE buffer at 190V for seven minutes. Quality PCR products were picked and outsourced to the Australian Genome Research Facility (AGRF) in Brisbane, Australia for purification and bi-directional sequencing. Sequence editing, alignment and analysis Forward and reverse chromatograms at each gene were checked for quality, assembled by specimen ID and aligned to a comparative reference sequence (derived from GenBank), using Lasergene® SeqMan Pro™Seqman software. The chromatograms were first checked for quality of read and as a means to detect and confirm sequence polymorphisms. Sequences identified at AGRF as containing low sequence signal strength (relative to background fluorescent noise) or less than acceptable pre-determined levels of signal homogeneity at >50% of the nucleotides in the sequence read, were discarded. Low quality nucleotide sites observed within sequences which passed these initial criteria were manually scored as unknown sites. Sequences were trimmed at both ends of the alignment to remove primer sequences and poor quality read. Quality checked sequences were further aligned using the CLUSTALW algorithm with default parameters as implemented in Bioedit 7.0. Four methods were used to assess the ability of each gene region to discriminate among the species; i) neighbour-joining (NJ) tree, ii) presence of a DNA barcode gap, iii) BLAST of sequences against GenBank, and iv) diagnostic character method. RESULTS DNA sequences were recovered from all four gene regions, rbcL, matK, trnL-trnF and ITS, with high success rates across species and specimens within species. All four gene regions had an amplification success rate > 85% among specimens while sequencing success rates ranged from 93% to 100% of PCR products. NJ tree analysis Tree based analysis using NJ was used to determine the extent of species monophyly at each gene region. A species was considered to be discriminated if all its specimen sequences formed a single, well supported (>70% bootstrap support) monophyletic clade distinct from all other species. The three chloroplast gene regions, rbcL, matK, and trnL-trnF, did not generate NJ trees with monophyletic clusters for each species at this level of support and hence were unable to discriminate among all the Conyza species. The nuclear ITS region exhibited the greatest species discrimination and was able to separate five of the eight species. DNA barcode gap analysis A DNA barcode gap is observed among species at a gene when the minimum interspecific distance is higher than the maximum intraspecific distance (Hebert et al. 2004). The greater the overlap between intraspecific and interspecific genetic distances, the less effective a gene region will be as a DNA barcode for species discrimination (Meyer and Paulay 2005). Each of the three chloroplast gene regions, rbcL, matK, and trnL-trnF, had considerable overlap between intraspecific and interspecific distances indicating absence of a DNA barcode gap. The ITS region also displayed a region of overlap but it was marginal compared to those of the chloroplast regions. Additional analysis of the ITS region identified absence of a DNA barcode gap at five of the 28 possible pairwise species comparisons. Table 1. Primers used in PCR. Gene region Primer (direction) Sequence (sequence direction 5'–3') rbcL rbcLF(F)1 ATGTCACCACAAACAGAGACTAAAGC rbcL rbcLajf634(R)2 GAAACGGTCTCTCCAACGCAT matK 3FX_KIM(F)3 CGTACAGTACTTTTGTGTTTACGNG matK 1RX_KIM(R)3 ACCCAGTCCATCTGGAAATCTTGGTNC trnL-trnF Ucp-e(F)4 GGTTCAAGTCCCTCTATCCC trnL-trnF Ucp-f(R)4 ATTTGAACTGGTGACACGAG ITS ITS5a(F)5 TATCATTTAGAGGAAGGAG ITS ITS4(R)5 GCATATCAATAAGCGGAGGA (Kress and Erickson 2007); (Fazekas et al. 2008); 3 (Dunning and Savolainen 2010); 4 (Taberlet et al. 1991); 5 (Baldwin 1992). 1 402 2 Sequences queried against GenBank Sample sequences at each gene region were queried against preexisting sequence accessions at the GenBank database using the BLAST algorithm. The absence of some Conyza species in the GenBank database at particular gene regions prevented some sequences from being further scrutinised. The chloroplast regions had less ability than the ITS region to discriminate at the species level with both positive and unique identification rates of 0%, 85% and 78% for the rbcL, matK and trnL-trnF regions Nineteenth Australasian Weeds Conference respectively. The ITS region was able to positively identify 100% of the sequences queried against preexisting specimen accessions. The complete failure of the rbcL gene region to correctly identify all species can be attributed to two GenBank records which have potentially been misidentified. These two GenBank sequences were also anomalies in the NJ tree analysis. A review of the rbcL gene region for diagnostic nucleotide sites identified one potential unique diagnostic site that may differentiate C. bonariensis from all other species including C. bilbaoana but the results were confounded by the two potentially misidentified GenBank specimens discussed previously in the section titled ‘Sequences queried against GenBank’. Diagnostic character based analysis An analysis of the ITS sequences present among the surveyed Conyza species identified several polymorphic nucleotide positions that were unique in character state (A, C, G or T) for particular Conyza species and therefore potentially useful as diagnostic nucleotide characters for distinguishing species. Four of the species, C. aegyptiaca, C. leucantha, C. parva and C. primulifolia have multiple nucleotide sites which can uniquely identify each of these species from the eight Conyza species under review. Conyza aegyptiaca has at least thirteen unique nucleotide sites that are fixed for a character observed at all specimens within the species but absent at all other surveyed Conyza; C. leucantha possesses seven sites, while C. parva and C. primulifolia both possess two unique sites. Conyza canadensis has four nucleotide sites that can be used to distinguish this species from C. sumatrensis, C. bonariensis and C. bilbaoana. Conyza sumatrensis has one site that allows for separation from C. bonariensis and C. bilbaoana. In summary, diagnostic nucleotide sites exist in the ITS region that can separate all the Conyza species reviewed in this study except for C. bonariensis and C. bilbaoana. Regarding C. bonariensis and C. bilbaoana, two nucleotide sites (positions 143 and 551 (Table 2)) were of further interest. The two nucleotide sites distinguished C. bilbaoana from all Australian sourced C. bonariensis samples, but not the GenBank accessions of C. bonariensis sourced from Hawaii and Taiwan. These two overseas C. bonariensis specimens differed from the Australian sourced C. bonariensis at these two sites, and shared the same nucleotide type as that of C. bilbaoana. DISCUSSION All four methods of analysis demonstrated that the chloroplast gene regions (rbcL, matK and trnL-trnF) individually were poor species discriminators of Conyza. The nuclear ITS region was however superior in all cases. NJ trees are regularly used in DNA barcoding for species discrimination (Van Velzen et al. 2012). In the case of Conyza, NJ trees based on individual and concatenated gene regions were unable to differentiate among all eight species. NJ analysis of the ITS region provided greater resolution than at the chloroplast regions and supported the monophyly of five of the eight morpho-species. The genetic distance relationships among C. bonariensis, C. bilbaoana and C. sumatrensis were ambiguous and poorly resolved at all gene regions using NJ and bootstrap analysis. Conyza bonariensis was paraphyletic with respect to both C. bilbaoana and C. sumatrensis; the latter two species were each well supported as reciprocally monophyletic sister species but they only differed from one another by a few nucleotides and were nested within the clade containing all C. bonariensis. There was however, no sharing of identical DNA barcode sequences among these three species. The close genetic relationship among the three species indicates they recently diverged from a shared common ancestor; it is therefore likely the paraphyly of C. bonariensis relative to C. sumatrensis and C. bilbaoana is due to incomplete sorting of ancestral polymorphisms among the species (Fazekas et al. 2009). The absence of well resolved species monophyly also indicates there has been insufficient time for novel and unique mutations to accumulate within the species at the surveyed genes (Van Velzen et al. 2012). Our results indicate genetic distance based methodologies, such as NJ cannot accurately identify all species within Conyza at the genes surveyed here. Diagnostic nucleotide character methods have been found to outperform genetic distance based methods (such as NJ trees) when dealing with recently diverged species as these methods look for unique nucleotide site characters in the DNA sequence that are fixed within a species and different to that observed at other species within a genus (Lowenstein et al. 2009). Table 2. Polymorphic nucleotide sites of interest in the ITS alignment at C. bonariensis in comparison to C. bilbaoana. Nucleotide position Species Sample source 143 551 C. bonariensis 37 × Australian specimens plus 1 × overseas specimen C A C. bonariensis 2 × GenBank accessions* A G C. bilbaoana 2 × Australian specimens A G * Specimens from Hawaii and Taiwan. 403 Nineteenth Australasian Weeds Conference Diagnostic sites at the ITS region were identified that could separate all species except C. bonariensis and C. bilbaoana. The separation of C. bonariensis from C. bilbaoana using the ITS region was thwarted by polymorphisms present in the population distribution of C. bonariensis at two nucleotide sites (Table 2). At both sites, all 38 C. bonariensis samples (primarily Australian samples), carried a fixed nucleotide site which differed from C. bilbaoana. However, two C. bonariensis GenBank sequences sampled from outlying Pacific localities carried two different nucleotides to the Australian C. bonariensis and these nucleotides were shared with C. bilbaoana. Confirmation of the taxonomic identity of these two overseas specimens is required. This failure of the ITS region to separate C. bonariensis from C. bilbaoana necessitated the review of the chloroplast regions for further diagnostic sites. One site in rbcL proved successful in the task of discriminating C. bonariensis from all other species under review. However, this is subject to further clarification of the potentially incorrect identification of GenBank specimens as previously discussed. These results highlight the importance of wide geographical sampling in DNA barcoding studies. In the absence of the two overseas C. bonariensis ITS samples, it may have been erroneously concluded that the ITS region alone was able to separate all the eight Conyza species existing in Australia. Once specimens from a wider geographical area were included in this study intraspecific variation at the ITS region increased, eliminating several of the potentially diagnostic nucleotide sites available for species discrimination. Additional samples of C. bonariensis, and C. bilbaoana from the species’ natural distribution and other exotic locations outside of Australia are required to ensure intraspecific variation has been adequately surveyed, and to further assist in clarification of the genetic relationship between the two species. Subject to the results of future additional sampling and clarification of the two potentially misidentified GenBank samples, a combination of the ITS and rbcL gene regions is proposed as a suitable multi-locus DNA barcode for genetic identification of species in the Conyza genus. This recommendation is based on the presence of key diagnostic nucleotide sites within these regions that individually or in combination are able to identify each of the eight Conyza species found in Australia. ACKNOWLEDGMENTS We thank the Graham Centre for Agricultural Innovation who funded this project via a Research Initiative Grant, Wagga Wagga Agricultural Institute for provid404 ing access to the DNA barcoding research laboratory, and Tony Bean of the Queensland Herbarium for providing material of C. aegyptiaca and C. leucantha. REFERENCES Baldwin, B.G. (1992). Phylogenetic utility of the internal transcribed spacers of nuclear ribosomal DNA in plants: An example from the compositae. Molecular Phylogenetics and Evolution 1, 3-16. CBOL Plant Working Group. (2009). A DNA barcode for land plants. Proceedings of the National Academy of Sciences 106, 12794-7. Dunning, L.T. and Savolainen, V. (2010). Broad-scale amplification of matK for DNA barcoding plants, a technical note. Botanical Journal of the Linnean Society 164, 1-9. Fazekas, A.J., Burgess, K.S., Kesanakurti, P.R., Graham, S.W., Newmaster, S.G., Husband, B.C., Percy, D.M., Hajibabaei, M. and Barrett, S.C. (2008). Multiple Multilocus DNA Barcodes from the Plastid Genome Discriminate Plant Species Equally Well. Plos One 3, e2802. Fazekas, A.J., Kesanakurti, P.R., Burgess, K.S., Percy, D.M., Graham, S.W., Barrett, S.C.H., Newmaster, S.G., Hajibabaei, M. and Husband, B.C. (2009). Are plant species inherently harder to discriminate than animal species using DNA barcoding markers? Molecular Ecology Resources 9, 130-9. Hebert, P.D. and Gregory, T.R. (2005). The Promise of DNA Barcoding for Taxonomy. Systematic Biology 54, 852-9. Hebert, P.D., Stoeckle, M.Y., Zemlak, T.S. and Francis, C.M. (2004). Identification of Birds through DNA Barcodes. PLoS biology 2, e312. Kress, W.J. and Erickson, D.L. (2007). A Two-Locus Global DNA Barcode for Land Plants: the Coding rbcL Gene Complements the Non-Coding trnHpsbA Spacer Region. Plos One 2, e508. Lowenstein, J.H., Amato, G. and Kolokotronis, S.-O. (2009). The Real maccoyii: Identifying Tuna Sushi with DNA Barcodes–Contrasting Characteristic Attributes and Genetic Distances. Plos One 4, e7866. Meyer, C.P. and Paulay, G. (2005). DNA Barcoding: Error Rates Based on Comprehensive Sampling. PLoS biology 3, e422. Taberlet, P., Gielly, L., Pautou, G. and Bouvet, J. (1991). Universal primers for amplification of three non-coding regions of chloroplast DNA. Plant molecular biology 17, 1105-9. Van Velzen, R., Weitschek, E., Felici, G. and Bakker, F.T. (2012). DNA Barcoding of Recently Diverged Species: Relative Performance of Matching Methods. Plos One 7, e30490.