UTILLdb, a Pisum sativum in silico forward and reverse genetics tool

Richard Thompson

ABSTRACT: The systematic characterization of gene functions in species recalcitrant to Agrobacterium-based transformation, like Pisum sativum, remains a challenge. To develop a high throughput forward and reverse genetics tool in pea, we have constructed a reference ethylmethane sulfonate mutant population and developed a database, UTILLdb, that contains phenotypic as well as sequence information on mutant genes. UTILLdb can be searched

Open Access et al. Dalmais 2008 Volume 9, Issue 2, Article R43 Method UTILLdb, a Pisum sativum in silico forward and reverse genetics tool Marion Dalmais¤*, Julien Schmidt¤*, Christine Le Signor¤†, Francoise Moussy†, Judith Burstin†, Vincent Savois†, Gregoire Aubert†, Veronique Brunaud*, Yannick de Oliveira*, Cecile Guichard*, Richard Thompson† and Abdelhafid Bendahmane* Addresses: *Unité de Recherche en Génomique Végétale, UMR INRA-CNRS, Rue Gaston Crémieux, 91057 Evry Cedex, France. †INRA, Unite Mixte de Recherche en Génétique et Ecophysiologie des Légumineuses (INRA-ENESAD), Domaine d'Epoisses, 21110 Bretenières, France. ¤ These authors contributed equally to this work. Correspondence: Abdelhafid Bendahmane. Email: bendahm@evry.inra.fr Published: 26 February 2008 Received: 29 November 2007 Revised: 17 January 2008 Accepted: 26 February 2008 Genome Biology 2008, 9:R43 (doi:10.1186/gb-2008-9-2-r43) The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2008/9/2/R43 © 2008 Dalmais et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Pea genetics database <p>UTILLdb lation.</p> is a database of phenotypic and sequence information on mutant genes from a reference Pisum sativum EMS-mutant popu- Abstract The systematic characterization of gene functions in species recalcitrant to Agrobacterium-based transformation, like Pisum sativum, remains a challenge. To develop a high throughput forward and reverse genetics tool in pea, we have constructed a reference ethylmethane sulfonate mutant population and developed a database, UTILLdb, that contains phenotypic as well as sequence information on mutant genes. UTILLdb can be searched online for TILLING alleles, through the BLAST tool, or for phenotypic information about mutants by keywords. Background Mutational approaches have been widely exploited in breeding and basic research. In the genomic era, the completion of the sequencing of several plant genomes has enabled the development of reverse genetics strategies, where one first identifies a target gene based on the functional annotation of its sequence, and then proceeds with the phenotypic characterization of mutant alleles. Several mutagenesis techniques are dedicated to this approach, notably RNA interference suppression [1,2] and insertional mutagenesis by transposon tagging [3,4] or Agrobacterium T-DNA insertion [5]. These methods, however, are still mainly based on Agrobacterium T-DNA vectors and, thus, rely on the ability of a given plant species to be transformed. On the other hand, chemical mutagenesis based on an alkylating agents like ethylmethane sulfonate (EMS) [6] provides an easy and cost-effective way to saturate a genome with mutations. TILLING (targeting induced local lesions in genomes) uses EMS mutagenesis coupled with gene-specific detection of single-nucleotide mutations [7-9]. This reverse genetic strategy encompasses all types of organisms [10-14] and can be automated in a high throughput mode, which is an absolute necessity to match the speed of candidate gene discovery. The success of the TILLING approach relies on the construction of high quality mutant libraries. Ideally, the mutant population is phenotyped so that in silico analysis of the mutant lines can be carried out. To date, phenotypic databases can be found for tomato [15], rice [16], Lotus japonicus [13] and Arabidopsis [17], and a searchable collection of phenotypic mutants is available for Zea mays [18], Pisum sativum [19] and Arabidopsis thaliana [20]. Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, Pea (P. sativum) belongs to the Leguminoseae family, which provides excellent dietary components with health-promoting benefits and offers the important ecological advantage of contributing to the development of low input farming systems by fixing atmospheric nitrogen and further minimizing the need for external inputs when used as a break crop. Since Gregor Mendel's groundbreaking work on the theories of heredity, pea has been extensively used for basic research, in particular in the fields of seed biology and plant architecture. In many studied examples, legume genes were shown to have novel functions compared to those described for related Arabidopsis genes. Detailed characterization of these legume genes will help our understanding of cross-species gene function [21]. However, functional gene validation by transformation is impractical due to the difficulty of transforming pea using Agrobacterium. This situation renders pea an ideal candidate for TILLING. Although several pea EMS mutant populations already exist, they are unsuitable for a genomic approach as they have not been prepared or maintained under rigorously controlled conditions and suffer from crosscontamination. Hence, there is a need for a high-quality P. sativum genetic mutant reference collection, which could be used for both forward and reverse genetics studies. Within the frame of the European Grain Legumes Integrated Project [22], we have developed such a population by mutagenizing P. sativum cultivar Caméor with EMS, and establishing an associated TILLING platform and phenotype database, UTILLdb. was observed between these two doses with a tendency toward higher seed production with 16 mM EMS, so a final dose of 20 mM EMS was used for population production. The mean number of seeds per pod was also slightly higher for the plants treated with 16 mM than for those treated with 24 mM EMS. The high rate of arrested embryos in pods of M1 plants treated with EMS doses of 16-24 mM attested to its good mutagenesis efficacy. Out of 8,600 M1 plants, more than 4,817 lines that had produced more than 5 M2 seeds each were individually harvested. To produce M3 seeds, four M2 seeds per M1 plant were sown in two-liter pots and M3 seeds were harvested from two sister plants, referred to as A and B. Leaf material was harvested from the healthiest looking plant, referred to as A (Figure 1). Seed stocks were sent to the Grain Legumes stock center in Dijon for multiplication, distribution and long-term storage of the lines. Results Production of Caméor mutant population Caméor is an early-flowering garden pea cultivar that completes its reproductive cycle within four months, permitting three successive generations a year under greenhouse conditions. Although pea is predominantly self-fertilizing, some residual cross-pollination can occur. In order to avoid contamination, 100 Caméor plants, derived from single seeds, were analyzed for genetic uniformity using a set of 16 short sequence repeat markers distributed over every arm of the seven predicted pea chromosomes [23] and left to set seeds in insect-proof greenhouses. In total, 10,000 Caméor seeds were produced and used to create the mutant population. In order to balance maximum mutation density with acceptable plant survival rate, we first conducted a 'kill-curve' analysis on batches of 100 seeds, using a range of doses from 8 to 57 mM EMS. Most treated first generation mutant (M1) plants exhibited retarded growth at an early seedling stage, but all of them recovered. Thirty plants from each treatment were then grown until maturity and assessed for fertility and seed production. A high loss of fertility was observed at the highest doses, with less than 30% of plants fertile at doses higher than 32 mM EMS. The highest EMS doses allowing 50% of plants to set seeds, 16 mM and 24 mM, were retained and tested on large batches of seeds (Table 1). Little difference Volume 9, Issue 2, Article R43 Dalmais et al. R43.2 Phenotyping of the Caméor mutant population As we intended to create a reference mutant collection that could be used for forward and reverse genetics, we carried out a systematic phenotyping of the mutant population. Our phenotype scoring was based on visual characterization of four plants per M2 family at key developmental stages, from germination until fruit maturation. To facilitate the phenotype scoring we defined a phenotype ontology adapted to pea. This phenotyping tool does not cover all phenotypic alterations (for example, no root evaluation was carried out) and was constructed for high-throughput scoring of many mutant lines in a relatively short growing season. The vocabulary used to describe the mutant plants was organized in a hierarchical tree and is composed of 107 subcategories of phenotypes clustered at different levels. The complete list of the vocabulary used is shown in Additional data file 1 and the number of lines found in each major phenotype category is shown in Table 2. Out of the 4,817 M2 families, 1,840 showed a visible phenotype, which represents 38% of the lines. Among the lines that showed a visible phenotype, 45% were scored for a single phenotype and 55% displayed multiple phenotypes, that is, they fall into more than one major phenotype category (Figure 2a). This rate of pleiotropy is an underestimation as the phenotypic characterization is based on high-throughput visual observation of only four mutant lines per M2 family. Detailed morphological and biochemical characterization of higher numbers of plants per M2 family would result in more phenotypic effects per mutant and, thus, a higher rate of pleiotropy. The most commonly observed phenotypes are related to stem size, leaf and plant architecture, followed by those related to cotyledons, stipules and seeds, with the least abundant phenotypes being related to flowers, plantlet architecture and petiole morphology (Figure 2b). Examples of phenotypes corresponding to the primary categories described are shown in Figure 3. Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, EMS treatment 8600 M1 plants Single seed descent 4817 M2 seed lots B C Dalmais et al. R43.3 has not yet been sequenced, acquisition of the genomic sequences of target genes is facilitated by the high degree of synteny between pea and the model plant Medicago truncatula, which is being sequenced [24]. The CODDLE program (Codons Optimized to Discover Deleterious Lesions [25,26]) combined with the PRIMER3 tool [27] are used to define the best amplicon for TILLING. PCR products used for TILLING have a maximum size of about 1,500 bp and, therefore, longer genes are divided into several amplicons. To reduce variation in the quality and the quantity of the PCR amplification product due to the pea genome complexity and low amount of genomic DNA used in PCR, nested PCR is performed. Mutations are detected in the amplified targets using the mismatch-specific endonuclease ENDO1, as described previously [28]. Individual mutant lines are identified following a pool deconvolution step, and then the mutated base is identified by sequencing. Seeds A Volume 9, Issue 2, Article R43 D Growing and phenotyping of 4 M2 plants per M1 plant DNA extraction from plant A and collection of M3 seeds from plant A and B separately Figure 1 Establishment of pea EMS mutant library Establishment of pea EMS mutant library. Caméor seeds were EMS mutagenized. Out of 8,600 M1 plants self-fertilized in an insect-proof glasshouse, 4,817 produced more than 5 M2 seeds each. Four M2 seeds, referred to as A-D, per M1 parent were grown to maturity and scored for phenotypes. DNA was extracted from the plants referred to as A, which were left to set M3 seeds. As a backup, M3 seeds were harvested from the sister B plants. The collected M3 seeds were sent to the Grain Legumes Biological Resource Center for distribution, maintenance of the lines and long-term storage of the mutant library. Caméor TILLING platform To set up the pea TILLING platform, DNA samples were prepared from 4,704 M2 plants, each representing an independent family and organized in pools of 8 M2 families. One key factor in TILLING is the availability of the annotated genomic sequence of the gene to be tilled. Even though the pea genome A primary objective in a mutagenesis project is to generate a saturated resource where every locus is mutated and represented by multiple alleles. To evaluate the existence of multiple alleles per locus, we screened for mutations in the pea Methyl transferase 1 gene (PsMet1) [29]. Three amplicons of 1,383, 1,310 and 1,149 bp were tilled (Figure 4) and 96 mutants were identified (Figure 5). Sequence analysis of the mutations showed that 6 were intronic, 37 silent, 50 missense and 3 nonsense mutations (Figure 4b). Although characterization of PsMet1 mutants is beyond the scope of this article, we found that retrieval of the mutant alleles from the A plant M3 seed stocks was successful, without the need to use backup M3 seed stocks collected from the sister B plants (Figure 1). The exonic mutants were mostly present as heterozygotes (79 out of 90 mutations), but 11 lines were homozygous for the mutations. As expected with EMS mutagenesis, these mutations were distributed relatively evenly within the screened amplicons (Figure 4b). To further evaluate the quality of the mutant population, we extended the TILLING screen to another 19 genes and identified 371 point mutations in those genes (Table 3). As expected for EMS, all the mutations were G:C to A:T transitions [6,30]. Induced mutations discovered in exons consisted of 66.75% missense, 28.51% silent and 4.74% stop mutations (Table 4). Although the number of observed missense mutations was bigger than the amount predicted by CODDLE (63.80%), we recovered stop mutations in a slightly lower proportion than predicted (6.90%). As many tilled amplicons harbor intronic segments, some recovered mutations were intronic. Although some of these could potentially affect the efficiency of mRNA splicing, such an impact is unpredictable. Thus, intronic mutants were not characterized further. In contrast, the large number of non-synonymous mutations recovered is of interest as they may lead to gain- or loss-of-function phenotypes. Such mutations will also permit dissection of the function of the protein with respect to its sub-domain structure. Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, Volume 9, Issue 2, Article R43 Dalmais et al. R43.4 Table 1 Effect of EMS Dose of EMS Total M1 seeds sown 0 mM 16 mM 20 mM 24 mM 100 1000 4000 3600 Percentage of M1 plants setting seeds 100% 61% 63% 58% Percentage of M1 plants yielding more than 5 seeds 100% 56% 52% 39% 3% 45% 49% 52% 4.83 ± 0.91 2.00 ± 0.86 0.91 ± 1.30 0.79 ± 1.93 Percentage of arrested embryos in pods of M1 plants Mean number of seeds per pod (± SD) Effect of the concentration of EMS on M2 seed setting, on the frequency of arrested embryos (data are expressed as the percentage of total seeds analyzed) and on the mean number of seeds per pod in the M1 generation (200 pods analyzed per dose). SD, standard deviation. We calculated the mutation frequency in the 20 targeted genes (Table 3) according to Greene et al. [6]: mutation frequency equals the size of the amplicon multiplied by the total number of samples screened divided by the total number of identified mutants. We estimated the average mutation rate to be one mutation every 200 kb. This mutation density is 1.5 times higher than the rate of one mutation per 300 kb reported for Arabidopsis, the best characterized TILLING mutant population to date [6]. Therefore, the 16-24 mM dose of EMS used to create the pea mutant population appears to be an adequate dose for TILLING. On average, we identified 34 alleles per tilled gene (after normalization to TILLING of the entire population). Considering that about half of missense mutations should have a deleterious effect on a typical Table 2 Number of M2 families affected in the major categories and subcategories of phenotypes Major category Subcategory 1 Color 172 Shape 32 Cotyledon No. of families 2 Plantlet architecture Architecture 7 3 Plant architecture Architecture 316 Branching type 205 4 Leaf Color 610 Shape and arrangements 387 Appearance 253 Size 81 77 5 Stipule Size/color/shape 6 Petiole Petiole 7 Stem Stem size 8 9 Flower Seed 6 1,447 Shape 36 protein [31], 25 alleles per tilled kilobase would be sufficient for phenotypic analyses. Setup of the UTILLdb database We scored 4,817 lines in the mutant population for phenotypic alterations using 107 subcategories of phenotypes. In TILLING screens we searched for mutations in 20 genes and identified 467 alleles. In order to manage and integrate the expanding data from both the phenotype recordings and TILLING target genes, we implemented the database UTILLdb. UTILLdb was developed according to a relational database system, interconnecting four main modules: lines, phenotype categories, sequences and mutations. Two main types of data are accessible, the morphological phenotypes of mutants and the sequences of tilled genes and corresponding alleles, when available. UTILLdb may be searched using a sequence, through a BLAST tool [32] or for a phenotypic feature using a keyword search. The outcome of the search is shown as a table of results that displays the phenotype of each line, with associated pictures and mutated sequence if it exists. Thus, the user could ask whether lines that share mutations in a specific gene share the same phenotypes and vice versa. As we expect the phenotypic characterization of the TILLING mutants to become more detailed as they are analyzed by UTILLdb users, UTILLdb was designed so that the passport data of the mutant lines can be extended or modified as needed. UTILLdb is publicly accessible through a web interface [33]. A link is implemented to facilitate seed ordering. UTILLdb serves also as an entry point for users wishing to have their favorite gene tilled on the Caméor TILLING platform. Results from those screens as well as the phenotype of the mutants identified will be implemented in UTILLdb. Flower morphology 24 Flowering time 4 Discussion Reproductive organs 12 Mutant population for forward and reverse genetics Seed color 2 Shape 4 Size 66 EMS-mutagenized populations have been created for different crops with, in many cases, multiple populations per crop. Information on the quality of the mutagenesis and the production and maintenance of the seed stocks are, however, often unavailable. We have constructed a reference EMS mutant population from P. sativum cultivar Caméor under Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, Volume 9, Issue 2, Article R43 Dalmais et al. R43.5 plant phenotype ontology [34,35], a hierarchical description intended to develop a vocabulary that describes anatomy, morphology, and growth and developmental stages of a flowering plant, for the main reason that the plant phenotype ontology vocabulary is not yet adapted to describe mutant morphological traits in a crop like pea. Instead, the vocabulary used to describe the pea mutant plants was inspired from previous investigations of mutant collections (tomato [15], lotus [13], barley [36]) and adapted to pea. Tto exploit the mutant population using reverse genetics, genomic DNA was prepared from the mutant lines via highthroughput automated protocols, and organized in pools for bulked screening. Individuals with mutations in the gene of interest were isolated by systematic pool deconvolution. Genes and mutations were integrated in UTILLdb through a web interface, which allows for global analysis of the TILLING mutants in the collection. This database also serves as a portal for users to request materials or TILLING experiments. Saturation of the mutation screen Figure Distribution rate of pleiotropy 2 of phenotypic characteristics of the mutant population and Distribution of phenotypic characteristics of the mutant population and rate of pleiotropy. (a) Number of M2 families in each phenotypic group. The x-axis indicates the nine major phenotypic categories, listed in Table 2, and the y-axis indicates the total number of M2 families. Each bar represents the number of mutants in the corresponding category. The blue bar represents the quantity of pleiotropic mutants (having more than one phenotype), given by the first number in the category label. The red bar represents the non-pleiotropic mutants and is given by the second number in the category label. (b) Total number of M2 families (y-axis) sharing 1-5 major phenotypic categories (x-axis). The bar for one phenotypic category indicates how many mutants are categorized in only one phenotypic group (non-pleiotropic mutants), and the bars for the 2-5 phenotypic categories represent the number of mutants that share two to five phenotypes, respectively. In each case, the total number of mutants is indicated on the top of the bar. controlled conditions and developed a database, UTILLdb, which presents phenotypic data based on visual characterization of M2 plants from young seedling to fruit maturation stages. A hierarchical categorization of mutant phenotypes was used to describe the mutant plants. To facilitate the phenotype description, digital images were also recorded. We did not implement the previously published EMS mutagenesis causes primarily G:C to A:T transitions [30]. In the TILLING screen for mutations in PsMet1, we identified 90 independent exonic mutations in a sequence that contains 1,434 cytosines and guanines and this in a mutant population of 4,704 M2 families. Based on this we estimated the average frequency of mutations to be 1.33 × 105 (90/4,704 × 1,434). Given a genome size of 5,000 Mb and a 43.23% G:C content in the coding sequence of the pea genome [37], there are 2.2 × 109 bp susceptible to EMS mutagenesis. Assuming that all G:C base pairs are equally sensitive to EMS, we would expect approximately 2.93 × 104 mutations in each EMS-mutagenized M1 plant ((1.33 × 10-5) × (2.2 × 109)). We used the binomial distribution, P = 1 - (1 - F)N, to calculate the probability of finding a mutation in a given G:C base pair in our mutant population. In this formula, P is the probability of finding the mutation, F is the mutation frequency per base pair (1.33 × 10-5), and N is the number of M1 mutant lines (4,704). Using this formula we estimated the probability of finding one mutation in any given G:C base pair in the genome as 0.06%. Increasing the size of the mutant population to 50,000 M2 plants raises the probability of finding one mutation in any given G:C base pair in the genome to 52%. This number is relatively small and could be managed by our platform. In fact, 50,000 independent lines represent 65 DNA pool plates (96-wells) or only 16 plates (384-wells). This purely theoretical example shows that EMS mutagenesis coupled with TILLING is a very powerful tool for creating genetic diversity, especially if one considers that routine transformation of P. sativum has not yet been achieved and, hence, insertional mutagenesis is not an option. Analysis of mutants identified through TILLING The calculated overall mutation rate of one mutation every 200 kb found in our population is intermediate between the Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, A. Plant (a) Plant 566 566 B. Plant (b) Plant 939 939 C. Plant (c) Plant 54 54 D. Plant (d) Plant 1236 1236 E. Plant (e) Plant 903 903 F. Plant (f) Plant 903 903 G. Plant (g) Plant 1567 1567 H. Plant (h) Plant 630 630 Figure 3 (see legend on next page) Genome Biology 2008, 9:R43 Volume 9, Issue 2, Article R43 Dalmais et al. R43.6 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, Volume 9, Issue 2, Article R43 Dalmais et al. R43.7 Figure 3 of Examples (seemutant previous phenotypes page) representing the nine major phenotypic groups Examples of mutant phenotypes representing the nine major phenotypic groups. (a) Plant 566: cotyledon color, albino. (b) Plant 939: plantlet architecture, bushy; plant architecture, hyper compact; leaf color, pale green; stem size, extreme dwarf. (c) Plant 54: plant architecture, determinate growth. (d) Plant 1,236: plant architecture, basal branching; leaf color, pale green, yellow; leaf size, medium; stem size, dwarf. (e, f) Plant 903: leaf, cone shaped at leaf base; flowers, sterile flowers. (g) Plant 1,567: leaf, distorted; stipule, silver-argentous. (h) Plant 630: flowers, cauliflower type inflorescence; flowers, abnormal all; stem, dwarf; leaf, upcurling. rate of one mutation per 300 kb reported for Arabidopsis [6] or Cenorhabditis elegans (1/293 kb) [38] and rice [39], and 2.5-fold higher than the rate of two mutations per megabase for TILLING in maize [40]. A much more saturated mutation density has been observed in tetraploid wheat (1/40 kb), hexaploid wheat (1/24 kb) [41] or Brassica napus (1/10 kb; unpublished data); however, such species are able to withstand much higher doses of EMS without obvious impact on survival or fertility rates, due to multiple gene redundancies in their polyploid genomes. In the TILLING screen, we recovered from 8 (Sym29) to 96 mutants (PsMetI) per tilled gene. Some genes (End1, TL) are obviously much more mutated than others (DOF2, eIF(iso)4e), despite the similarity of their GC content (36.6% for DOF2, 34% for TL). Of course, the propensity of a gene to withstand mutations without the resulting protein causing deleterious effects on the plant plays a major role and gametophytic lethal mutations will never be found in the population. However, we could see that some primer pairs used for screening gave a higher background noise than others, which affects the discrimination between true mutants and false positives on the polyacrylamide gel image, and reduces the number of mutants recovered. Nevertheless, our average score of 34 mutant alleles identified per tilled gene is higher than the 10 mutations per gene of Arabidopsis [6] or rice [39]. Screening for mutations in PsMet1 resulted in 96 alleles, of which 50 were missense and 3 non-sense mutations; in this case, the large number of mutations recovered is, at first sight, impressive, but the large gene size and targeted region (3,842 bp), together with the fact that we tilled the entire population (4,704 lines), accounts for this result. On the other hand, this example illustrates the strength of TILLING when it comes to finding a specific point mutation. Because of the high number of alleles we routinely identify, the possible impact of missense mutations on the function of a protein is assessed before systematic phenotyping of the mutant plants, using two different programs: SIFT (Sorting Intolerant From Tolerant) [42], which uses PSI-BLAST alignments, and PARSESNP (Project Aligned Related Sequences and Evaluate SNPs) [43], which provides a position-specific scoring matrix based on alignment blocks (Figure 4d). In the case of PsMet1, 13 out of the 50 missense mutations (23%) were predicted to have a major impact on the function of the protein. Thus, the corresponding 13 mutant lines are characterized first. In Arabidopsis, the MetI gene controls maintenance of CpG methylation [29]. It was previously shown that point mutations in AtMetI can lead to genome hypomethylation [29,44] with a variable impact on plant development, ranging from a late-flowering phenotype to reduced embryo viability. P. sativum has a genome mainly composed of non-coding repeated sequences [45], which are typically subjected to chromatinmediated epigenetic suppression of transcription [46], in which an elevated rate of DNA methylation plays a major role. We intend to investigate the stability of those regions in a hypomethylated context, that is, in PsmetI lines for which CpG methylation is altered. As we are currently amplifying our mutant lines in order to get homozygous mutants and characterize their phenotypes and DNA methylation levels, it is still too early to speculate on the observed versus predicted effect of the mutations according to SIFT. Conclusion In the 21st century, the need for crop improvement in order to face the growing demand of modern agriculture is increasing, while the social acceptance of so-called genetically modified or transgenic crops remains low. Besides, many plant species of agronomic importance are still unsuitable for Agrobacterium-based insertional mutation techniques, including pea. The development of TILLING technology, based on EMS mutagenesis, can contribute to overcoming this deficiency. Furthermore, as EMS generates an allelic series of the targeted genes it becomes possible to investigate the role of essential genes that are otherwise not likely to be recovered in genetic screens based on insertional mutagenesis. We have developed a complete tool that can be used for both forward (EMS saturated mutant collection and the associated phenotypic database) and reverse (high-throughput TILLING platform) genetics in pea, for both basic science or crop improvement. Hence, by opening it to the community, we hope to fulfill the expectations of both crop breeders and scientists who are using pea as their model of study. Materials and methods EMS treatment EMS was diluted to the chosen dose in deionized water. Bottles (Schott type) each containing 900 seeds immersed in 450 Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, 1383 bp (a) 1310 bp Volume 9, Issue 2, Article R43 Dalmais et al. R43.8 1149 bp b (b) Figure 4 between predicted and obtained mutations Comparison Comparison between predicted and obtained mutations. (a) Output of the CODDLE program using as an example the PsMetI genomic sequence. Exons are represented by white boxes and introns by red lines. The CODDLE program was used to identify those regions of the gene in which G:C to A:T transitions are most likely to result in deleterious effects on the encoded protein (represented by the probability curve traced in tourquoise). The CODDLE algorithm is based on an evaluation of protein sequence conservation from comparison of database accessions of homologous proteins. For PsMetI, three fragments were chosen based on these CODDLE results (blue lines). External and internal primers were designed to amplify each region by nested PCR. (b) Graphic representation of mutations identified in the three regions of the gene PsMetI. This drawing was made using the PARSESNP program [43], which maps the mutation on a gene model to illustrate the distribution of mutations. Purple triangles represent silent mutations and black and red triangles represent missense and truncation mutations, respectively. Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, Volume 9, Issue 2, Article R43 Dalmais et al. R43.9 Figure 5 screen TILLING TILLING screen. Example of a PsMetI TILLING screen on eightfold pooled pea DNA. The image of the cleavage reaction is collected from both channels (dyes IRD700 and IRD800). The sizes of the cleavage products (circled) from the two dye-labeled DNA strands (red or green) add up to the size of the full-length PCR product (top of the gel). PCR artifacts are distinguishable from true mutants by yellow points (red and green added) as they appear at the same size in both channels. The size of the cleavage product (the sizing ladder can be seen at the left and middle of the image) indicates approximately where the single nucleotide polymorphism is located in the fragment. ml of deionized water-EMS solution were placed on a rotary shaker (50 rpm) overnight (15 h soaking). The EMS solution was then removed and seeds were rinsed extensively 12 times for 30 minutes with gentle shaking. Plant growing conditions Pea (cultivar Caméor) seeds were sown in pots filled with sterile pouzzolane (inert medium, light volcanic grit) at a sowing depth of about 2 cm followed by abundant watering in greenhouse conditions. Plants were then automatically watered with a solution of 3.5:3.1:8.6 N:P:K. The temperature was maintained between 14°C at night and 30°C during daytime, with supplementary lighting to provide a 16 h day. Genomic DNA extraction and pooling Four pea leaf discs (diameter 10 mm) were collected in 96well plates containing 2 steel beads (4 mm) per well, and tissues were ground using a bead mill. Genomic DNA was isolated using the DNeasy 96 Plant Kit (Qiagen, Hilden, Germany). All genomic DNA was quantified on a 0.8% agarose gel using λ DNA (Invitrogen, Carlsbad, CA, USA) as a concentration reference. DNA samples were diluted tenfold and pooled eightfold in a 96-well format. A population of 4,704 arrayed DNAs from mutagenized individuals is presently available for screening. PCR amplification and mutation detection Figure 5 PCR amplification was based on nested-PCR and universal primers [14]. The first PCR amplification was a standard PCR reaction using target-specific primers and 4 ng of pea genomic DNA. One microliter of the first PCR served as a template for the second nested PCR amplification, using a mix of gene-specific inner primers carrying a universal M13 tail (CACGACGTTGTAAAACGAC for forward primers; GGATAACAATTTCACACAGG for reverse primers), in combination with M13 universal primers, M13F700 (CACGACGTTGTAAAACGAC) and M13R800 (GGATAACAATTTCACACAGG), labeled at the 5'end with infra-red dyes IRD700 and IRD800 (LI-COR®, Lincoln, NE, USA), respectively. This PCR was carried out using 0.1 μM of each primer, using the following two step cycling program: 94°C for 2 minutes, 10 cycles at 94°C for 15 s, primer-specific annealing temperature for 30 s and 72°C for 1 minute, followed by 25 cycles at 94°C for 15 s, 50°C for 30 s and 72°C for 1 minute, then a final extension of 5 minutes at 72°C. MutaGenome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, Volume 9, Issue 2, Article R43 Dalmais et al. R43.10 Table 3 Tilled genes and mutation density in Caméor mutant population Tilled genes Ps CONSTANS-like a (PsCOLa) LectineA Sucrose transporter (SUT1) Amplicon size (bp) % of GC in exons Identified mutants Screened M2 families Mutation frequency 1/141 Kb 1,012 46.30% 11 1,536 971 40.80% 13 1,536 1/115 Kb 1,014 52.40% 12 1,536 1/130 Kb Cell wall invertase (cwINV) 1,612 41.50% 12 1,536 1/206 Kb Serine-threonine proteine kinase (Sym29) 2,457 44.00% 8 768 1/236 Kb Phosphoenolpyruvate carboxylase (PepC) 1,009 44.40% 25 3,072 1/124 Kb 870 39.10% 21 4,608 1/191 Kb 1,200 36.60% 9 3,072 1/410 Kb Lec1-like (L1L) DOF transcription factor 2 (PsDOF2) Trypsine inhibitor (TI1) 712 34.20% 13 3,840 1/210 Kb Pea albumine (PA2) 746 38.50% 9 3,072 1/255 Kb Anther specific protein (End1) 851 40.50% 31 3,072 1/84 Kb MADS box gene (PM10) 1,302 34.60% 20 4,608 1/300 Kb MADS box gene (PM2) 1,390 31.30% 28 4,608 1/229 Kb Tendril-less transcription factor (TL) 1,104 34.00% 28 3,072 1/121 Kb Eukaryotic translation initiation factor (eiF4e) 1,383 36.90% 36 4,608 1/177 Kb 772 36.70% 10 4,608 1/356 Kb Methyl transferase 1 (Met1) Eukaryotic translation initiation factor (eIF(iso)4e) 3,842 40.20% 96 4,704 1/188 Kb Retinoblastoma related (RBR) 1/112 Kb 2,959 40.80% 72 4,608 Late embryogenesis abundant protein (PsLEAM) 952 44.00% 17 4,608 1/258 Kb Heat shock protein 22 (HSP22) 622 45.66% 18 4,608 1/159 Kb 26,780 40.12% 467 - 1/200 Kb Total/mean Part or all of the Caméor mutant population was screened for mutations in the genes listed. The size of the screened amplicon, the number of mutants identified and the mutation frequency for each amplicon are indicated. The average mutation frequency was estimated to one mutation per 200 kb and is calculated as in Greene et al. [6], except that we have summed the sizes of all the amplicons and we divided by the total number of identified mutants. tion detection was carried out as described previously [28]. The nature of the mutations was identified by sequencing. Abbreviations CODDLE, Codons Optimized to Discover Deleterious Lesions; EMS, ethylmethane sulfonate; PARSESNP, Project Aligned Related Sequences and Evaluate SNPs; SIFT, Sorting Intolerant From Tolerant; TILLING, targeting induced local lesions in genomes. Authors' contributions CLS, FM, JB, GA and RT performed the EMS mutagenesis and took care of the plants; MD extracted the DNA; TILLING screens and analysis were done by MD and JS; VS, VB, YDO and CG set up UTILLdb; AB coordinated the study. The manuscript was written by JS, MD, CLS, VB and AB. Additional data files The following additional data are available with the online version of this paper. Additional data file 1 is a table providing Table 4 Mutation types All Silent Missense Truncation Percent expected (CODDLE) 100 29.30 63.80 6.90 Percent observed 100 28.51 66.75 4.74 Percent heterozygouscacro 86.60 27.7 56.4 2.5 Percent homozygous 13.34 4.78 8.06 0.5 Comparison of expected and observed types of mutations in tilled exonic regions; distribution between heterozygous and homozygous states in the mutant lines. The percentage of expected mutations was calculated by adding the results of CODDLE analysis, on the amplified regions only, for each gene. Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 Genome Biology 2008, the pea mutant phenotype list used for describing and recording M2 mutant plant phenotypes in UTILLdb. Additional Pea mutant Click mutant here plant for data phenotype phenotypes filefile 1 listinused UTILLdb UTILLdb. for describing and recording M2 17. Acknowledgements This work was supported by the European Grain Legumes Integrated Project (FOOD-CT-2004-506223) and the European Commission FP6 Framework Programme. The authors wish to thank B Darchy for taking care of the plants; K Triques, P Audigier and S Chauvin for taking samples; M Nicolas and the GFPC TILLING team for useful discussions; and J Hofer and C Goldstein for useful comments on the manuscript. We are also grateful to the GLIP collaborators, who permitted us to present the screening data of their genes. 18. 19. 20. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Hilson P, Allemeersch J, Altmann T, Aubourg S, Avon A, Beynon J, Bhalerao RP, Bitton F, Caboche M, Cannoot B, Chardakov V, CognetHolliger C, Colot V, Crowe M, Darimont C, Durinck S, Eickhoff H, de Longevialle AF, Farmer EE, Grant M, Kuiper MTR, Lehrach H, Leon C, Leyva A, Lundeberg J, Lurin C, Moreau Y, Nietfeld W, Paz-Ares J, Reymond P, et al.: Versatile gene-specific sequence tags for Arabidopsis functional genomics: transcript profiling and reverse genetics applications. Genome Res 2004, 14:2176-2189. Waterhouse PM, Graham MW, Wang M-B: Virus resistance and gene silencing in plants can be induced by simultaneous expression of sense and antisense RNA. Proc Natl Acad Sci USA 1998, 95:13959-13964. Long D, Coupland G: Transposon tagging with Ac/Ds in Arabidopsis. Methods Mol Biol 1998, 82:315-328. May BP, Liu H, Vollbrecht E, Senior L, Rabinowicz PD, Roh D, Pan X, Stein L, Freeling M, Alexander D, Martienssen R: Maize-targeted mutagenesis: a knockout resource for maize. Proc Natl Acad Sci USA 2003, 100:11541-11546. Alonso JM, Stepanova AN, Leisse TJ, Kim CJ, Chen H, Shinn P, Stevenson DK, Zimmerman J, Barajas P, Cheuk R, Gadrinab C, Heller C, Jeske A, Koesema E, Meyers CC, Parker H, Prednis L, Ansari Y, Choy N, Deen H, Geralt M, Hazari N, Hom E, Karnes M, Mulholland C, Ndubaku R, Schmidt I, Guzman P, Aguilar-Henonin L, Schmid M, et al.: Genome-wide insertional mutagenesis of Arabidopsis thaliana. Science 2003, 301:653-657. Greene EA, Codomo CA, Taylor NE, Henikoff JG, Till BJ, Reynolds SH, Enns LC, Burtner C, Johnson JE, Odden AR, Comai L, Henikoff S: Spectrum of chemically induced mutations from a largescale reverse-genetic screen in Arabidopsis. Genetics 2003, 164:731-740. Henikoff S, Till BJ, Comai L: TILLING. Traditional mutagenesis meets functional genomics. Plant Physiol 2004, 135:630-636. Comai L, Henikoff S: TILLING: practical single-nucleotide mutation discovery. Plant J 2006, 45:684-694. McCallum CM, Comai L, Greene EA, Henikoff S: Targeting induced local lesions in genomes (TILLING) for plant functional genomics. Plant Physiol 2000, 123:439-442. Bentley A, MacLennan B, Calvo J, Dearolf CR: Targeted recovery of mutations in Drosophila. Genetics 2000, 156:1169-1173. Coghill EL, Hugill A, Parkinson N, Davison C, Glenister P, Clements S, Hunter J, Cox RD, Brown SDM: A gene-driven approach to the identification of ENU mutants in the mouse. Nat Genet 2002, 30:255-256. Colbert T, Till BJ, Tompa R, Reynolds S, Steine MN, Yeung AT, McCallum CM, Comai L, Henikoff S: High-throughput screening for induced point mutations. Plant Physiol 2001, 126:480-484. Perry JA, Wang TL, Welham TJ, Gardner S, Pike JM, Yoshida S, Parniske M: A TILLING reverse genetics tool and a web-accessible collection of mutants of the legume Lotus japonicus. Plant Physiol 2003, 131:866-871. Wienholds E, van Eeden F, Kosters M, Mudde J, Plasterk RHA, Cuppen E: Efficient target-selected mutagenesis in zebrafish. Genome Res 2003, 13:2700-2707. Menda N, Semel Y, Peled D, Eshed Y, Zamir D: In silico screening of a saturated mutation library of tomato. Plant J 2004, 38:861-872. Miyao A, Iwasaki Y, Kitano H, Itoh J-I, Maekawa M, Murata K, Yatou O, Nagato Y, Hirochika H: A large-scale collection of phenotypic data describing an insertional mutant population to 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. Volume 9, Issue 2, Article R43 Dalmais et al. R43.11 facilitate functional analysis of rice genes. Plant Mol Biol 2007, 63:625-635. Kuromori T, Wada T, Kamiya A, Yuguchi M, Yokouchi T, Imura Y, Takabe H, Sakurai T, Akiyama K, Hirayama T, Okada K, Shinozaki K: A trial of phenome analysis using 4000 Ds-insertional mutants in gene-coding regions of Arabidopsis. Plant J 2006, 47:640-651. Lawrence CJ, Seigfried TE, Brendel V: The Maize Genetics and Genomics Database. The community resource for access to diverse maize data. Plant Physiol 2005, 138:55-58. Lee JM, Davenport GF, Marshall D, Ellis THN, Ambrose MJ, Dicks J, van Hintum TJL, Flavell AJ: GERMINATE. A generic database for integrating genotypic and phenotypic information for plant genetic resource collections. Plant Physiol 2005, 139:619-631. Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, GarciaHernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 2003, 31:224-228. Domoney C, Duc G, Ellis TN, Ferrandiz C, Firnhaber C, Gallardo K, Hofer J, Kopka J, Kuster H, Madueno F, Munier-Jolain NG, Mayer K, Thompson R, Udvardi M, Salon C: Genetic and genomic analysis of legume flowers and seeds. Curr Opin Plant Biol 2006, 9:133-141. The Grain Legumes Integrated Project [http://www.eugrain legumes.org/] Loridon K, McPhee K, Morin J, Dubreuil P, Pilet-Nayel M, Aubert G, Rameau C, Baranger A, Coyne C, Lejeune-Hénaut I, Burstin J: Microsatellite marker polymorphism and mapping in pea (Pisum sativum L.). Theor Appl Genet 2005, 111:1022-1031. Aubert G, Morin J, Jacquin F, Loridon K, Quillet M, Petit A, Rameau C, Lejeune-Hénaut I, Huguet T, Burstin J: Functional mapping in pea, as an aid to the candidate gene selection and for investigating synteny with the model legume Medicago truncatula. Theor Appl Genet 2006, 112:1024-1041. Till BJ, Reynolds SH, Greene EA, Codomo CA, Enns LC, Johnson JE, Burtner C, Odden AR, Young K, Taylor NE, Henikoff JG, Comai L, Henikoff S: Large-scale discovery of induced point mutations with high-throughput TILLING. Genome Res 2003, 13:524-530. CODDLE: Codons Optimized to Discover Deleterious LEsions [http://www.proweb.org/coddle] Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. In Bioinformatics Methods and Protocols: Methods in Molecular Biology Edited by: Krawetz SA, Misener S. Totowa, NJ: Humana Press; 2000:365-386. Triques K, Sturbois B, Gallais S, Dalmais M, Chauvin S, Clepet C, Aubourg S, Rameau C, Caboche M, Bendahmane A: Characterization of Arabidopsis thaliana mismatch specific endonucleases: application to mutation discovery by TILLING in pea. Plant J 2007, 51:1116-1125. Kankel MW, Ramsey DE, Stokes TL, Flowers SK, Haag JR, Jeddeloh JA, Riddle NC, Verbsky ML, Richards EJ: Arabidopsis MET1 cytosine methyltransferase mutants. Genetics 2003, 163:1109-1122. Krieg DR: Ethyl methanesulfonate-induced reversion of bacteriophage T4rII mutants. Genetics 1963, 48:561-580. Markiewicz P, Kleina LG, Cruz C, Ehret S, Miller JH: Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. J Mol Biol 1994, 240:421-433. Altschul S, Gish W, Miller W, Myers E, Lipman DJ: Basic local alignment search tool. J Mol Biol 1990, 215:403-410. UTILLdb: URGV TILLING database [http://urgv.evry.inra.fr/ UTILLdb] Ilic K, Kellogg EA, Jaiswal P, Zapata F, Stevens PF, Vincent LP, Avraham S, Reiser L, Pujar A, Sachs MM, Whitman NT, McCouch SR, Schaeffer ML, Ware DH, Stein LD, Rhee SY: The Plant Structure Ontology, a unified vocabulary of anatomy and morphology of a flowering plant. Plant Physiol 2007, 143:587-599. Jaiswal P, Avraham S, Ilic K, Kellogg E, McCouch S, Pujar A, Reiser L, Rhee S, Sachs M, Schaeffer M, Stein L, Stevens P, Vincent L, Ware D, Zapata F: Plant Ontology (PO): a controlled vocabulary of plant structures and growth stages. Comp Funct Genomics 2005, 6:388-397. Caldwell DG, McCallum N, Shaw P, Muehlbauer GJ, Marshall DF, Waugh R: A structured mutant population for forward and reverse genetics in Barley (Hordeum vulgare L.). Plant J 2004, Genome Biology 2008, 9:R43 http://genomebiology.com/2008/9/2/R43 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. Genome Biology 2008, 40:143-150. Nakamura Y, Gojobori T, Ikemura T: Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res 2000, 28:292. Gilchrist E, O'Neil N, Rose A, Zetka M, Haughn G: TILLING is an effective reverse genetics technique for Caenorhabditis elegans. BMC Genomics 2006, 7:262. Till B, Cooper J, Tai T, Colowit P, Greene E, Henikoff S, Comai L: Discovery of chemically induced mutations in rice by TILLING. BMC Plant Biol 2007, 7:19. Till B, Reynolds S, Weil C, Springer N, Burtner C, Young K, Bowers E, Codomo C, Enns L, Odden A, Greene E, Comai L, Henikoff S: Discovery of induced point mutations in maize genes by TILLING. BMC Plant Biol 2004, 4:12. Slade AJ, Fuerstenberg SI, Loeffler D, Steine MN, Facciotti D: A reverse genetic, nontransgenic approach to wheat crop improvement by TILLING. Nat Biotechnol 2005, 23:75-81. Ng PC, Henikoff S: SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 2003, 31:3812-3814. Taylor NE, Greene EA: PARSESNP: a tool for the analysis of nucleotide polymorphisms. Nucleic Acids Res 2003, 31:3808-3811. Xiao W, Custard KD, Brown RC, Lemmon BE, Harada JJ, Goldberg RB, Fischer RL: DNA methylation is critical for Arabidopsis embryogenesis and seed viability. Plant Cell 2006, 18:805-814. Ellis THN, Poyser SJ: An integrated and comparative view of pea genetic and cytogenetic maps. New Phytologist 2002, 153:17-25. Martienssen RA, Colot V: DNA methylation and epigenetic inheritance in plants and filamentous fungi. Science 2001, 293:1070-1074. Genome Biology 2008, 9:R43 Volume 9, Issue 2, Article R43 Dalmais et al. R43.12

RELATED PAPERS

RELATED TOPICS

Log In

UTILLdb, a Pisum sativum in silico forward and reverse genetics tool

UTILLdb, a Pisum sativum in silico forward and reverse genetics tool

UTILLdb, a Pisum sativum in silico forward and reverse genetics tool

UTILLdb, a Pisum sativum in silico forward and reverse genetics tool

UTILLdb, a Pisum sativum in silico forward and reverse genetics tool

Related Papers

RELATED PAPERS

RELATED TOPICS