The evolution of momordica cyclic peptides

Quentin  Kaas

Cyclic proteins have evolved for millions of years across all kingdoms of life to confer structural stability over their acyclic counterparts while maintaining intrinsic functional properties. Here, we show that cyclic miniproteins (or peptides) from Momordica (Cucurbitaceae) seeds evolved in species that diverged from an African ancestor around 19 Ma. The ability to achieve head-to-tail cyclization of Momordica cyclic peptides appears to have been acquired through a series of mutations in their acyclic precursor coding sequences following recent and independent gene expansion event(s). Evolutionary analysis of Momordica cyclic peptides reveals sites that are under selection, highlighting residues that are presumably constrained for maintaining their function as potent trypsin inhibitors. Molecular dynamics of Momordica cyclic peptides in complex with trypsin reveals site-specific residues involved in target binding. In a broader context, this study provides a basis for selecting Mo...

The Evolution of Momordica Cyclic Peptides Pre-submission version. For final version see: Mol Biol Evol, 2014 Nov 6. pii: msu307. [Epub ahead of print] PMID: 25376175 Article (Discoveries) The Evolution of Momordica Cyclic Peptides Tunjung Mahatmanto,1 Joshua S. Mylne,2 Aaron G. Poth,1 Joakim Swedberg,1 Quentin Kaas,1 Hanno Schaefer3, and David J. Craik*,1 1 Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4072, Australia 2 The University of Western Australia, School of Chemistry and Biochemistry & The ARC Centre of Excellence in Plant Energy Biology, 35 Stirling Highway, Crawley, Perth 6009, Australia 3 Plant Biodiversity Research, Technische Universität München, Emil-Ramann Strasse 2, 85354 Freising, Germany *Corresponding author: David J. Craik Address: Institute for Molecular Bioscience, 306 Carmody Road, Building 80, The University of Queensland, Brisbane, Queensland 4072, Australia Telephone: +61 7 3346 2019 Fax: +61 7 3346 2101 E-mail: d.craik@imb.uq.edu.au 1 Abstract Cyclic proteins have evolved for millions of years across kingdoms of life to confer structural stability over their acyclic counterparts while maintaining intrinsic functional properties. Here we show that cyclic mini-proteins (or peptides) from Momordica (Cucurbitaceae) seeds evolved in species that diverged from an African ancestor around 19 million years ago. The ability to achieve head-to-tail cyclization of Momordica cyclic peptides appears to have been acquired via a series of mutations in their acyclic precursor coding sequences following recent and independent gene expansion event(s). Evolutionary analysis of Momordica cyclic peptides reveals sites that are under selection, highlighting residues that are presumably constrained for maintaining their function as potent trypsin inhibitors. Molecular dynamics of Momordica cyclic peptides in complex with trypsin reveals site-specific residues involved in target binding. In a broader context, this study provides a basis for selecting Momordica species to further investigate the biosynthesis of the cyclic peptides and for constructing libraries that may be screened against evolutionarily related serine proteases implicated in human diseases. Key words: Momordica seeds, cyclic peptides, serine protease inhibitors, evolution 2 Introduction Head-to-tail or backbone cyclization confers peptides with resistance to proteolysis and peptides bearing this trait presumably evolved from ancestral acyclic peptides (Trabi and Craik 2002). Backbone cyclized peptides found in angiosperms are here categorized into three groups based on their structures (Fig. 1). Group 1 consists of cyclic peptides with three disulfide bonds that form a knotted core, i.e. the cyclotides (Craik et al. 1999). Group 2 consists of cyclic peptides with one disulfide bond, which includes SFTI-1 or sunflower trypsin inhibitor-1 (Luckett et al. 1999), SFT-L1 or SFTI-Like 1 (Mylne et al. 2011), and PDPs or PawS-derived peptides (Elliott et al. 2014). Members of both groups belong to the subclass homopolycyclopeptides type VIII of the plant cyclopeptides (Tan and Zhou 2006). Group 3 consists of cyclic peptides with no disulfide bonds referred to as orbitides (Arnison et al. 2013). Members of this group belong to the subclass homomonocyclopeptides type VI of the plant cyclopeptides (Tan and Zhou 2006). All of the aforementioned cyclic peptides are gene-encoded and occur in distant families of angiosperms. Group 1 have been found in Rubiaceae, Violaceae, Cucurbitaceae, Fabaceae, and Solanaceae (Craik 2013); Group 2 in Asteraceae (Elliott et al. 2014); and Group 3 in Annonaceae, Caryophyllaceae, Euphorbiaceae, Lamiaceae, Linaceae, Phytolaccaceae, Rutaceae, Schizandraceae, and Verbenaceae (Arnison et al. 2013). Despite being phylogenetically distant, the biosynthesis of the first two groups appears to have evolved in parallel, being channelled through transpeptidation by asparaginyl endopeptidase (AEP) that joins their ends (Saska et al. 2007; Gillon et al. 2008; Mylne et al. 2011; Mylne et al. 2012). This AEP-mediated cyclization requires a conserved proto-N-terminal Gly, proto-C-terminal Asx (i.e. Asn or Asp), small residue at P1′, and Xle (i.e. Leu or Ile) at P2′ (Mylne et al. 2011; 3 Mylne et al. 2012). On the other hand, cyclization of the third group is mediated by peptide cyclase (PCY1), a serine protease-like enzyme (Barber et al. 2013). The seeds of Momordica cochinchinensis (Cucurbitaceae) contain cyclic peptides that belong to the cyclotide group. MCoTI-I and -II (Momordica cochinchinensis trypsin inhibitor-I and II) are the first members of Momordica cyclic peptides to be discovered (Hernandez et al. 2000) and have been studied extensively, particularly for applications in the biomedical field. Interest in these peptides stems from the proteolytic stability conferred by their structural motif (Colgrave and Craik 2004) and the amenability of the residues comprising their intracysteine loops to mutation (Austin et al. 2009). Thus, in principle Momordica cyclic peptides can be used as highly stable grafting scaffolds. As both peptides have the ability to enter cells (Greenwood et al. 2007; Cascales et al. 2011; Contreras et al. 2011; D’Souza et al. 2014), they have been touted as potential vectors for the delivery of grafted epitopes with desired activities to intracellular targets (Ji et al. 2013). Examples of grafting applications include engineering of: (i) MCoTI-I into an anti-HIV agent (Aboye et al. 2012) and an antagonist of intracellular proteins Hdm2 and HdmX for suppressing tumor growth (Ji et al. 2013) and (ii) MCoTI-II into a β-tryptase inhibitor (Thongyoo et al. 2009; Sommerhoff et al. 2010) and human leukocyte elastase inhibitor (Thongyoo et al. 2009) for inflammatory disorders, and a pro-angiogenic agent for wound healing (Chan et al. 2011). Despite these remarkable successes, the introduction of new activities onto the Momordica cyclic peptide scaffold remains challenging because the limiting structural and functional constraints are not yet fully understood. During the course of evolution, negative selection purges mutations that have deleterious effects to the structure of peptides, constraining their ability to acquire new function, which would otherwise be fixed under positive selection (Tokuriki and Tawfik 4 2009). Thus, knowledge of residues under selection should provide insights into the limitations of Momordica cyclic peptides to be engineered as scaffolds. To understand the evolution of Momordica cyclic peptides, it is imperative that their distribution and diversity be traced and mapped. Momordica is a clade of c. 60 tropical and subtropical climbers and creepers that diverged from a common ancestor around 35 million years ago (Schaefer and Renner 2010) and underwent long-distance dispersal across Africa, Asia, and Australia (Schaefer et al. 2009). Being a crucial agent for plant dispersal, seeds play a key role in the speciation of plants, carrying within them genetic information for the establishment of new plants under spatio-temporally disparate environmental pressures. Tracing the distribution of cyclic peptides and their acyclic counterparts in Momordica species will allow us to determine when, where, and how the genes for their biosynthesis evolved. In turn, this knowledge may serve as a basis for selecting Momordica species to further investigate how the cyclic peptides arise, e.g. through comparative transcriptome analysis to identify gene sequences and enzymes essential for their processing. Furthermore, mapping the diversity of cyclic peptides in Momordica seeds will allow us to identify sitespecific residues that are evolving under selection, i.e. negative selection to maintain the existing structure whereas positive selection to adopt new function. This information may be particularly useful in the context of designing inhibitors of evolutionarily related serine proteases implicated in human diseases using the Momordica cyclic peptide scaffold. In this study we describe the distribution and diversity of cyclic peptides and their acyclic counterparts in the seeds of 24 Momordica species and an outgroup species Siraitia grosvenorii (Fig. 2). We discover new TIPTOP (Two Inhibitor Peptide TOPologies) genes, which encode multiple cyclic peptide domains and terminate with an acyclic peptide domain 5 (Mylne et al. 2012), and partial gene sequences, which we refer to TIPRE (Tandem Inhibitor Peptide Repeats), that encode multiple acyclic peptide domains. We assemble transcripts that encode a single acyclic peptide domain like the previously reported TGTI-II (Towel Gourd Trypsin Inhibitor-II) cDNA (Ling et al. 1993; Mylne et al. 2012). In addition, we identify diagnostic peptides that correspond to the cyclic peptides and their acyclic counterparts. Despite having undergone long-distance dispersal events during the speciation of Momordica (Schaefer et al. 2009; Schaefer and Renner 2010), we found the sequence diversity of Momordica cyclic peptides to be low compared to other members of the cyclotide group (Kaas and Craik 2010). This conservation could be explained by the recentness of the event(s) that created the cyclic peptides or by the selection operating on the cyclic peptide domain repeats for maintaining their structure, and thus their documented function as potent trypsin inhibitors (Avrutina et al. 2005). As hydrogen bonding networks are known to play an important role in protein recognition (Lu et al. 1997), constraining mutations that could compromise the functional fold conferred by the hydrogen bond networks is vital. Molecular dynamics of Momordica cyclic peptides in complex with trypsin reveal alterations in the intermolecular hydrogen bond network upon single amino acid substitutions, highlighting sites that have the potential to be engineered for selective target binding. 6 Results In this study we traced the occurrence of cyclic peptides in the seeds of 24 Momordica species and an outgroup species Siraitia grosvenorii, mapped the residues under selection, and examined the effect of single amino acid substitutions to the intermolecular hydrogen bond network of selected naturally occurring Momordica cyclic peptides in complex with trypsin. Precursors of Momordica Cyclic and Acyclic Peptides PCR of Momordica genomic DNA using the primers that amplified TIPTOP genes from M. cochinchinensis and M. sphaeroidea (Mylne et al. 2012) resulted in new TIPTOP genes from two Asian Momordica, i.e. one from M. subangulata (TIPTOP4) and two from M. macrophylla (TIPTOP5 and TIPTOP6). TIPTOP4–6 respectively encode six, four, and five cyclic peptides, each terminating with an acyclic peptide (the list of the encoded peptides is given in Supplementary Table S1 and a representation of the precursors is given in Fig. 3A). Two of the encoded cyclic peptides, i.e. MCoTI-II and MCoTI-IV (hereafter we remove the MCo prefix because some of the peptides are also present in other Momordica species and use an Arabic numeral for simplicity, e.g. MCoTI-II becomes TI-2), have been reported (Hernandez et al. 2000; Mylne et al. 2012) whereas the others are new but have similar sequences that share the Asp-Gly cyclization point. Similarly, two of the encoded acyclic peptides, i.e. TI-5 and TI-6, have been reported (Mylne et al. 2012). The other acyclic peptide, i.e. TI-19, differs from TI-5 in that it has an additional N-terminal Gln. An alignment of Momordica cyclic peptides and their acyclic counterparts is given in Fig. 3B. A new set of primers for conserved sequences within the endoplasmic reticulum (ER) signal and the acyclic peptide domain was designed because the first primer set could not amplify TIPTOP genes from the remaining Momordica genomic DNA. PCR with this new set of 7 primers resulted in five partial gene sequences that appear to have undergone expansion similar to TIPTOP and thus were named TIPRE, for Tandem Inhibitor Peptide REpeats. Four of the partial gene sequences were found in the African M. anigosantha (TIPRE1–4) and one was found in the African M. friesiorum (TIPRE5). The list of the encoded peptides is given in Supplementary Table S1. The TIPRE peptides, i.e. TI-24–27, have similar sequences to the TIPTOP acyclic peptides but with additional four C-terminal residues like the TIPTOP cyclic peptides (Fig. 3B). This finding suggests that the acyclic peptides acquired features for cyclization following extension of their tail, which provided the target residues for AEP to then perform transpeptidation that ligates their C-terminus to their N-terminus. Analysis of a previously reported African M. charantia seed transcriptome (Yang et al. 2010) revealed five transcripts (the translation of the transcripts is given in Supplementary Table S2), each encoding a single acyclic peptide domain. Two of the transcripts encode the acyclic peptides MCTI-I (Momordica charantia Trypsin Inhibitor-I) and MCTI-III (Hara et al. 1989; Hamato et al. 1995). One transcript encodes an acyclic peptide, which we refer to as TI-28, that is similar to MCTI-II (Hara et al. 1989) but with an extended N-terminus. Another transcript encodes an acyclic peptide, which we refer to as EI-1 (Elastase Inhibitor-1), that is similar to MCEI-IV (Momordica charantia Elastase Inhibitor-IV (Hamato et al. 1995)) but differs in one residue following the N-terminal Glu. The absence of a dedicated precursor for MCTI-II and MCEI-I–III (Supplementary Table S2), which respectively are shorter than TI28 and EI-1 in their N-terminus (Fig. 3B), suggests that they are products of post-translational N-terminal trimming, a process that has been proposed to give rise to the acyclic peptides hedyotide B2–4 from the Rubiaceae Hedyotis biflora (Nguyen et al. 2011). On the other hand, the other transcript potentially encodes a new peptide, which we refer to as TI-23, as judged by the sequence similarity to the other Momordica cyclic peptides. Given the lack of features 8 for AEP processing in its precursor (Supplementary Table S2), it would be interesting to confirm the presence of TI-23. Identification of Peptides using a Targeted Proteomics Approach A targeted search for Momordica cyclic peptides and their acyclic counterparts was aided by the observation of tandem MS for diagnostic peptides, which results from the digestion of reduced and alkylated peptides. For cyclic peptides, the diagnostic peptides are chymotrypsin digests harbouring sequence tags that extend over their cyclization points, i.e. Cys29 to Leu8 (for residue numbering please refer to Fig. 3B). For acyclic peptides, the diagnostic peptides result from trypsin, chymotrypsin, or endoproteinase Glu-C digestion. A list of the sequences of representative diagnostic peptides found is given in Supplementary Table S3. The distribution of Momordica cyclic peptides and their acyclic counterparts is presented in Fig. 4. Tandem MS evidence for cyclic peptides was only found in the Asian M. cochinchinensis, M. macrophylla M. denticulata, M. subangulata, and M. clarkeana and in a close relative African M. gilgiana (a representative tandem MS spectrum is given in Supplementary Fig. S1A). No evidence was found for TI-23 in the African M. charantia. For the African M. anigosantha TIPRE peptides, evidence was only found to support acyclic peptides, as would be expected judging from the lack of a proto-N-terminal Gly. Observed tandem MS spectra were consistent with a chymotrypsin and an endoproteinase Glu-C digest product that correspond to acyclic peptides having an N-terminal pyrolated glutamine (Supplementary Fig. S1B) and a four C-terminal residue extension (Supplementary Fig. S1C). Evidence for the acyclic trypsin inhibitors was found in all of the species analysed (Supplementary Table S3, representative tandem MS spectra are given in Supplementary Figs. S1D and S1E). For the acyclic elastase inhibitors, evidence was only found in the African M. leiocarpa, M. foetida, M. balsamina, 9 and M. charantia (representative tandem MS spectra are given in Supplementary Figs. S1F and S1G). Mapping of Sites Under Selection To map the sites that are evolving under selection, we analyzed the number of synonymous substitution per site (dS) and the number of nonsynonymous substitution per site (dN) of the cyclic peptides. The value of the dN-dS indicates whether a particular site is evolving under negative (if the value is negative) or positive (if the value is positive) selection. The dN-dS analysis of the cyclic peptides (Supplementary Table S4) revealed that two cysteines and 11 intracysteine residues are evolving under negative selection, which include four of five residues in Loop 2, one of three residues in Loop 3, one of five residues in Loop 5, and five of eight residues in Loop 6. On the other hand, eight intracysteine residues are evolving under positive selection, which include three of six residues in Loop 1, one of three residues in Loop 3, one of one residue in Loop 4, two of five residues in Loop 5, and one of eight residues in Loop 6. The remaining 13 residues (four cysteines, three residues in Loop 1, one residue in Loop 2, one residue in Loop 3, two residues in Loop 5, and two residues in Loop 6) are neutral. This finding highlights the types of selection operating on the sites of the Momordica cyclic peptide scaffold (Fig. 5A). Dynamics of the Hydrogen Bond Network of Selected Cyclic Peptides with Trypsin Mutations provide an essential raw material for evolution and serve as a basis for acquiring new functions (Tokuriki and Tawfik 2009). As hydrogen bonds play a role in the recognition of inhibitors against their target proteins (Lu et al. 1997), examining the effect of mutations on the interaction of inhibitors against their target proteins is fundamental to understanding the mechanistic of their biological function. Here we examined the effect of single amino acid 10 substitutions to the intermolecular hydrogen bonds of the Momordica cyclic peptides TI-1 and TI-2 in complex with trypsin using molecular dynamics. The mutations made to TI-1 were based on the sequence of TI-8, TI-18 and TI-21 whereas those in TI-2 were based on the sequence of TI-10, TI-20, and TI-22. Analysis of the dynamics of the cyclic peptides in complex with trypsin reveals site-specific residues that play a role in forming the intermolecular hydrogen bond network (Fig. 5B; list of donors, acceptors, and frequency of occupancy of the hydrogen bonds is given in Supplementary Table S5). One of the prominent features of Momordica cyclic peptides is that their main chains, i.e. of Loops 1 and 6, form the majority of the hydrogen bond network with trypsin. For TI-1, the residues involved in main chain hydrogen bonding with trypsin are Gly1,2,32, Cys4,29, Pro5, Lys6, Ile7, Leu8, and Asp34 whereas the residues involved in side chain hydrogen bonding are Lys6, Gln9, Arg24, Asn26, and Ser31. For TI-2, the residues involved in main chain and side chain hydrogen bonding with trypsin are similar to TI-1, with the exception of Lys10 instead of Gly32 for the main chain and Tyr28 instead of Gln9 for the side chain. Single amino acid substitutions in both cyclic peptides altered their hydrogen bond network with trypsin, notably are the introduction or abolishment of and the increase or decrease in frequency of occupancy of a number of the main chain and side chain hydrogen bonds (Fig. 5B). This result highlights the dynamics of the hydrogen bond network of Momordica cyclic peptides in complex with trypsin, providing insights into the potential role of site-specific residues for target binding. 11 Discussion Momordica Cyclic Peptides Occur in Species That Diverged from an African Ancestor Around 19 Million Years Ago TIPTOP genes were found following the report of the cyclic peptides MCoTI-I and MCoTI-II from M. cochinchinensis seeds (Hernandez et al. 2000). The unusual nature of how the genes are organized, i.e. having multiple repeats of cyclic peptide domains that terminate with an acyclic peptide domain, led to the hypothesis that they might have expanded from an ancestral gene via internal duplication event(s) (Mylne et al. 2012). Tracing the distribution of TIPTOP cyclic peptides may provide insights into when and where they emerged. A targeted search revealed the occurrence of cyclic peptides in the Asian M. cochinchinensis, M. macrophylla, M. denticulata, M. subangulata, and M. clarkeana and in a close relative African M. gilgiana (Fig. 4). The lack of evidence for cyclic peptides in other representative African taxa suggests that the cyclic peptides have arisen within species that descended from a common ancestor to the Asian and a close relative African species. The Asian Momordica are a result of a long-distance dispersal of an African ancestor that came back to Asia around 19 million years ago – an event that marks the divergence of the Asian species from their close relative African M. gilgiana (Schaefer and Renner 2010). Thus, the ancestral gene of TIPTOP presumably has been inherited from this African ancestor. Interestingly, the expansion that created TIPTOP genes appears to have occurred recently and independently, as suggested by the highly conserved signal peptides (Fig. 6A), which are known to evolve rapidly (Li et al. 2009), and the distinct number of domain repeats and sequence of the encoded peptides. This scenario would require specific selective pressures operating on both the Asian species and the close relative African M. gilgiana. 12 Plausible Selective Advantage and Pressure Underlying the Expansion that Created TIPTOP Genes The selective advantage conferred by the expansion that created TIPTOP genes might be related to tight regulatory control for expression of the encoded peptides. The repetitive nature of TIPTOP genes would allow the expression of multiple peptides from one transcript. On the other hand, the expansion would alter the RNA secondary structure, which is known to be one of the key determinants to post-transcriptional regulation in plants (Silverman et al. 2013). A recent study using the model plant Arabidopsis thaliana showed that LOW MOLECULAR WEIGHT CYSTEINE-RICH-encoding mRNA is amongst the highly structured mRNA that tends to be degraded more frequently than less structured mRNA (Li et al. 2012). Calculation of the folding energy of TIPTOP transcripts using Mfold (Zuker 2003) suggests that the expansion decreases the free energy for folding of TIPTOP transcripts into their secondary structures (Fig. 6B). Taken together, the expansion might have allowed the encoded peptides to be produced efficiently but not excessively – a trait that fits well with a defense response and storage function. Resistance to invaders is amongst the most ancient traits that evolved through discriminating self from non-self (Staal and Dixelius 2007). Having a biological activity as potent inhibitors of trypsin (Avrutina et al. 2005), one of the main digestive enzymes of invaders, TIPTOP peptides may be regarded as anti-nutritive agents. Given that many of the known seed-derived inhibitors are only active against digestive enzymes of insects but not against endogenous enzymes (Shewry and Casey 1999), it is tempting to speculate that the expansion that created TIPTOP genes might have been triggered by predatory cues. On the other hand, the high cysteine content of TIPTOP peptides suggests that they might also serve a dual function for storage purposes, providing sulphur along with nitrogen and carbon for germination and 13 seedling growth (Shewry and Casey 1999). The exceptional stability of the cyclic cystine knot class of peptides (Colgrave and Craik 2004) might be related to their function as long-term storage proteins, supporting extended periods of dormancy as most Cucurbitaceae have orthodox seeds – they can tolerate considerable desiccation and thus have greater longevity compared to recalcitrant seeds (Ellis 1991). Further studies would be needed to test this dual function hypothesis. Mutations in the Acyclic Peptide Precursors: The Link to Cyclization The emergence of backbone cyclized peptides of Groups 1 and 2 (Fig. 1) have been shown to be mediated by AEP (Saska et al. 2007; Gillon et al. 2008; Mylne et al. 2011; Mylne et al. 2012), a vacuolar processing enzyme (VPE) that cleaves Asn and Asp (Hara-Nishimura et al. 1991; Hiraiwa et al. 1999), which respectively precedes and ends the TIPTOP cyclic peptide domains (Fig. 6C). The residues trailing the proto-C-terminal are usually a small residue at P1′ and Xle at P2′. Cyclization of the peptide backbone occurs through a transpeptidation that critically requires the presence of a proto-N-terminal Gly, which is thought to lack the steric hindrance of other side chains thus enabling transpeptidation to occur (Mylne et al. 2011). The absence of these features in the precursors of the acyclic peptides means that transpeptidation by AEP will not occur, thus AEP acts as the constraining evolutionary channel for cyclization (Mylne et al. 2012). Alignment of the leader and mature peptide repeats of TIPTOP4 from the Asian M. subangulata and TIPRE4 from the African M. anigosantha (Fig. 6C) suggests that a series of mutations following internal gene duplication provided the features for AEP to perform transpeptidation. We hypothesize that the acyclic peptides first acquired an extension of their C-terminal (by insertion of the SXXD segment) followed by deletion of their N-terminal 14 region (GVYXXXQR segment) and mutation of their P1′ proto-C-terminal trailing residue (Met to a small residue, in this case Ala), which would then lead to the predisposition of the precursors to cyclization by AEP. With an N-terminal Gln like the TIPTOP acyclic peptides and an extended C-terminus like the TIPTOP cyclic peptides, the TIPRE peptides may be regarded as “intermediates” in the evolution of TIPTOP cyclic peptides from their acyclic counterparts. This hypothesis is consistent with the phylogenetic analysis that places the African M. anigosantha close to the species in which TIPTOP cyclic peptides occur (Fig. 4; Schaefer and Renner 2010). The absence of evidence for the putative cyclic peptide TI-23 in the African M. charantia might be due to the lack of common features for AEP processing in its precursor, which only harbors a single peptide domain and does not have residues trailing the proto-C-terminal Asn and Asx preceding the proto-N-terminal Gly (Supplementary Table S2). Thus, internal gene duplication may be considered as a steppingstone to the acquisition of features for transpeptidation by AEP, which the TIPTOP genes then acquired via a series of mutations during their course of evolution. Molecular phylogenetic analysis reveals that the transcript encoding TI-23 is distantly related to the other coding sequences (Fig. 6D), suggesting that it might be an ancestral vestige of a kindred evolutionary process that, in the Asian and a close relative African species, created the cyclic peptides. Neofunctionalization of an Acyclic Peptide in the African Momordica Species Gene duplication has been considered to be the main source of material for the emergence of new functions. The rate at which a gene duplication event occurs is considered high, with the duplicates being silenced within a few million years and the survivors being selected under strong negative pressure (Lynch and Conery 2000). The high sequence similarity of the 15 transcripts encoding MCTI-I and MCTI-III and the transcripts encoding TI-28 and EI-1 (Supplementary Table S2), suggests that they have arisen via gene duplication. In the case of the transcripts encoding TI-28 and EI-1, the acyclic peptides have different biological activities, i.e. the former a trypsin inhibitor whereas the latter an elastase inhibitor. Because both of the acyclic peptides are expressed within the seeds of the African M. charantia, the duplicated gene may be considered to have undergone neofunctionalization, i.e. it acquired a new function that is preserved by natural selection. Sequence analysis reveals that the new function emerged from a one point mutation at the second codon of the P1 site – the site that interacts with the S1 pocket (or active site) of the target enzyme. The G of the CGA that encodes Arg, which is the preferred P1 residue for trypsin (Krieger et al. 1974), is replaced by T and thus changing it into Leu (Fig. 6E), which is preferred for elastase (Hara et al. 1989), if assuming that elastase inhibitor is the new function. Interestingly, the acyclic elastase inhibitors were only found in the African M. charantia and three closely related African species i.e. M. balsamina, M. leiocarpa, and M. foetida, which diverged around 21 million years ago (Schaefer and Renner 2010). This finding suggests that neofunctionalization is a rare evolutionary fate for this class of peptides. The Role of Site-Specific Residues of Momordica Cyclic Peptides A range of studies have shown remarkable success in the use of Momordica cyclic peptides as scaffolds for developing novel therapeutics (Poth et al. 2013). To better exploit Momordica cyclic peptides for biomedical applications, it is imperative that their evolvability, i.e. the ability to acquire new function through structural changes (Tokuriki and Tawfik 2009), be understood. Given the function of Momordica cyclic peptides as potent trypsin inhibitors, knowledge of their evolvability is particularly useful for developing novel inhibitors of other 16 evolutionarily related serine proteases – many of which play crucial roles in pathophysiological processes, such as inflammation and blood clotting (Bachovchin and Cravatt 2012). One of the approaches that can be used to understand the evolvability of Momordica cyclic peptides is by mapping their sequence diversity, which has been shaped by natural selection. Evolutionary analysis reveals the type of selection operating on the sites of Momordica cyclic peptides. As shown in Fig. 5A, the majority of the sites are either neutral (no substitution) or under negative selection (substitutions were synonymous). Residues that occupy these sites are presumably preserved for maintaining the cyclic cystine knot structure. Indeed, four of the six cysteines that form the cystine knot core are neutral and two are under negative selection. The N-terminal Gly and C-terminal Asp that have been shown to be vital for cyclization by AEP (Mylne et al. 2011) are also under negative selection. The significance of preserving site-specific residues in Loop 6 might be related to the effect that cyclization has on the folding pathway of the peptides, facilitating the formation of the correct cysteine connectivities, thus presumably reducing the entropic losses upon folding compared to their acyclic counterparts (Daly et al. 1999). Residues that occupy sites under positive selection are presumably paving the way to adopting a new function. The P1 site that defines the selectivity of Momordica cyclic peptides is evolving under positive selection (Fig. 5A) but the amino acid change has not introduced a new function, i.e. Lys and Arg are both preferred in the S1 pocket of trypsin (Krieger et al. 1974). The neutral selection operating on P1′ site, which is occupied by Ile – a hydrophobic residue thus preferred by trypsin (Kurth et al. 1997), suggests that Momordica cyclic peptides are co-evolving with trypsin, which has strictly conserved residues associated with its 17 specificity in both prokaryotes and eukaryotes (Rypniewski et al. 1994). Because the three dimensional structure of serine proteases is highly conserved and their active sites are virtually the same, i.e. having the catalytic triad composed of Asp–His–Ser (Higaki et al. 1987), molecular dynamics of Momordica cyclic peptides with trypsin may serve as a model for studying the role of site-specific residues for target binding, particularly that imparted by hydrogen bonding. As shown in Fig. 5B, the hydrogen bond network at the interface of Momordica cyclic peptides with trypsin is primarily formed by the main chain of the cyclic peptides, with P1, P2, P3, P5 and P2′ sites having a high frequency of occupancy of hydrogen bond. Although this hydrogen bond network is common in serine protease–inhibitor complexes where the inhibitory loops are locked in an extended antiparallel β-sheet conformation (Hedstrom 2002), this canonical conformation is not adopted by the inhibitory loop of the Momordica cyclic peptide scaffold (Daly et al. 2013). The occupancy of Pro at P2 and Cys at P3, which do not contribute to side chain interactions with trypsin, suggests that the selectivity of Momordica cyclic peptides is mediated by other sites, e.g. their prime sites. This characteristic is unique because S2 and S3 are known to determine the specificity of serine proteases, e.g. a hydrophobic residue at P2 is preferred by chymotrypsin (Brady and Abeles 1990) whereas at P3 by elastase (Thompson and Blout 1973). The P3′ site, which is under positive selection, might be important for target binding. Unlike Lys, the occupancy of Gln at P3′ introduces a side chain hydrogen bond and thus may explain the higher trypsin inhibitory activity of TI-1 compared to TI-2 (Avrutina et al. 2005). As the basis of evolution, mutations play a major role in forming the hydrogen bond network of protease–inhibitor interaction sites. Mutations that introduce new hydrogen bonds at these 18 sites or increase the frequency of occupancy of existing ones are highly desirable. However, mutations can also lead to the contrary and thus examining the effect of mutations to the hydrogen bond network at these interfaces is important. As shown in Fig. 5B, single amino acid substitutions altered the intermolecular main chain and side chain hydrogen bond network of Momordica cyclic peptides with trypsin. These alterations may serve as a basis for the identification of sites that have target binding potential, i.e. sites 9 in Loop 1; 24, 26, 28 in Loop 5; and 30, 31, 33 in Loop 6. This prediction agrees with a study that showed that the aforementioned sites in Loops 1 and 5 play a role in binding with trypsin (Austin et al. 2009). In the context of drug design, this knowledge may translate into the design of site-specific libraries using the Momordica cyclic peptide scaffold. This design approach was successful for engineering kalata B1, the prototypic cyclotide found in Rubiaceae and Violaceae, into an antagonist of neuropilin-1 and -2, which are known to be regulators of vascular and lymphatic development (Getz et al. 2013). In summary, this study presents evidence that suggests that Momordica cyclic peptides evolved in species that diverged from an African ancestor around 19 million years ago. The findings provide a basis for selecting species to further investigate the biosynthetic origin of Momordica cyclic peptides. Knowledge of the genes that encode cyclic peptides and enzymes involved in their maturation could potentially be used for their production in suitable host plants. In addition, this study showcases an interesting biological example of how natural selection – as imparted by mutations – is presumably operating to acquire features essential for cyclization of the acyclic peptides by AEP and to fine-tune the selectivity of cyclic peptides while maintaining their structure. This knowledge may find useful application in medicine for designing inhibitors of evolutionarily related serine proteases implicated in human diseases using the Momordica cyclic peptide scaffold or in agriculture for designing 19 improved pesticidal agents based on cyclotides (Poth et al. 2011). In the long run, knowledge of the biosynthesis and evolvability of Momordica cyclic peptides could translate into the production of “designer” peptide therapeutics in plant seeds. Materials and Methods Seed Material Seeds of M. anigosantha Hook.f., M. boivinii Baill., M. cabrae (Cogn.) C. Jeffrey, M. calantha Gilg, M. camerounensis Keraudren, M. cissoides Planch. ex Benth., M. clarkeana King, M. cymbalaria Frenzl ex. Naudin, M. denticulata Miq., M. foetida Schumach., M. friesiorum (Harms) C. Jeffrey, M. gilgiana Cogn., M. humilis (Cogn.) C. Jeffrey, M. jeffreyana Keraudren, M. leiocarpa Gilg, M. macrophylla Gage, M. parvifolia Cogn., M. rostrata Zimm., M. silvatica Jongkind, M. subangulata Blume, M. trifoliolata Hook.f., and Siraitia grosvenorii (Swingle) C. Jeffrey ex. A.M. Lu & Zhi Y. Zhang were provided by Hanno Schaefer of the Technische Universität München. Seeds of M. balsamina L. (reference number: 406803), M. charantia L. (reference number: 51359), and M. cochinchinensis (Lour.) Spreng. (reference number: 69291) were purchased from B & T World Seeds sarl, Paguignan, 34210 Aigues Vives, France. Genomic DNA Extraction, Gene Cloning, and Sequencing Momordica seeds were dehusked and finely ground in liquid nitrogen using a mortar and pestle. Genomic DNA was extracted using Qiagen DNEasy Plant Mini Kit following the protocol suggested by the company. TIPTOP genes were amplified using primers JM482 (5′CGT CTT GCT AGA GAA AGG GAG T-3′) and JM483 (5′-TCA GAA ACA GCA TAG CTT TCA C- 3′) (Mylne, et al. 2012). TIPRE sequences were amplified using primers TM P1 (5′- GAA ATG GAG AGC AAG AAG ATT CT-3′) and TM P5 (5′-AAG ATT CTA GGA 20 CAG GCT CTT TG-3′). PCR products were purified using QIAquick PCR Purification Kit (for single band) and QIAquick Gel Extraction Kit (for multiple bands). The purified PCR products were cloned into pGEM-T Easy (Promega) and sequenced at the Australian Genome Research Facility. A minimum of three independent clones was used to assemble the sequence using MacVector 12.7 software. SignalP 4.1 was used to predict the endoplasmic reticulum (ER) recognition site in the sequence (Petersen et al. 2011). Peptide cleavage sites were predicted based on the common features for processing and homology to previously reported peptides (Mylne et al. 2011; Mylne et al. 2012). Transcriptome Analysis Transcriptome sequencing data of the African M. charantia seeds (Yang et al. 2010) was accessed via the National Center for Biotechnology Information website (http://www.ncbi.nlm.nih.gov). Short Read Archive (SRA) under the accession numbers SRX030203 (normalized sequence data) and SRX030204 (non-normalized sequence data) were used to assemble transcripts containing plant-derived cystine knot peptide sequences (Gracy et al. 2008) with MIRA 3.4 (Chevreux 2005). A minimum of three overlapping contigs was used to assemble the transcripts. Peptide Extraction and Fractionation Peptides were extracted from Momordica seeds using a method based on acetonitrile/water/formic acid (25:24:1) as previously described (Mahatmanto et al. 2014). Targeted peptides were fractionated using a 3 cc cartridge, 200 mg sorbent, Waters Sep-Pak® C18 55–105 µm. Peptides were eluted with 1.5 mL of solvent B (90% v/v acetonitrile, 0.1% v/v formic acid) in a 10% gradient. Fractions containing the majority of the targeted peptides 21 (Supplementary Figs. S2 and S3) were collected. Samples were lyophilized, redissolved in 1% v/v formic acid, and stored at 4°C until further analysis. Ultra High Performance Liquid Chromatography (UHPLC)-Tandem Mass Spectrometry (MS/MS) Samples were reduced with dithiothreitol (DTT; final concentration 10 mM; incubated at 60°C for 30 minutes under nitrogen), alkylated with iodoacetamide (IAM; final concentration 25 mM; incubated at 37°C for 30 minutes in the dark), and split for overnight digestion with trypsin (approximately 1 µg per 100 µg lyophilized sample in 100 mM ammonium bicarbonate, pH 8.0; incubated at 37°C), chymotrypsin (approximately 2 µg per 100 µg lyophilized sample in 100 mM Tris-HCl, 10 mM calcium chloride, pH 8.0; incubated at 30°C), and endoproteinase Glu-C (approximately 3 µg per 100 µg lyophilized sample in (i) 100 mM ammonium bicarbonate, pH 8.0 and (ii) 100 mM sodium phosphate, pH 7.8; both incubated at 37°C). Following reduction, alkylation, and digestion, samples were analysed on a Nexera uHPLC (Shimadzu) coupled to a TripleTOF 5600 mass spectrometer (AB SCIEX) equipped with a duo electrospray ion source. Data were processed using Analyst® TF 1.6 software (AB SCIEX). MS/MS spectra were searched against a custom-built database of plant-derived cystine knot peptides using ProteinPilot™ 4.0 software (AB SCIEX). Phylogenetic Analysis The phylogeny of Momordica is based on a slightly simplified version of the sequence dataset of Schaefer & Renner (2010) with addition of sequences for Siraitia grosvenorii from Kocyan et al. (2007) (Supplementary Data 1). For details on DNA sequencing, alignment and phylogeny estimation see Schaefer & Renner (2010). 22 The evolutionary relationship between TIPTOP genes, TIPRE sequences, and the transcripts from the African M. charantia seeds was inferred under Maximum Likelihood (ML) using the General Time Reversible model (Nei and Kumar 2000). The initial tree for heuristic search using Nearest-Neighbor-Interchange (NNI) method was generated automatically by applying Neighbor-Join and BioNJ algorithms. The analysis was conducted in MEGA5 (Tamura et al. 2011). Nucleotide sequences used for this analysis are given in Supplementary Data 2. Mapping of Sites Under Selection Nucleotide sequences that encode the cyclic peptides were used to map sites that are under selection using MEGA5 (Tamura et al. 2011). The numbers of synonymous (s) and nonsynonymous (n) substitutions and the synonymous (S) and nonsynonymous (N) sites were estimated using joint Maximum Likelihood (ML) reconstructions of ancestral states under the Muse-Gaut (Muse and Gaut 1994) and General Time Reversible (Nei and Kumar 2000) models. ML of codon undergoing selection was estimated through HyPhy (Kosakovsky Pond et al. 2005) using an automatically generated Neighbor-joining tree. The probability of rejecting the null hypothesis of neutral evolution (P-value) was calculated as previously described (Suzuki and Gojobori 1999; Kosakovsky Pond and Frost 2005). Nucleotide sequences used for this analysis are given in Supplementary Data 3. Molecular Dynamics To calculate the average structure of selected cyclic peptides against trypsin, the ordinate of the crystal structure of native MCoTI-II bound to trypsin (PDB#4GUX) was used as reference (Daly et al. 2013). Simulation of the complexes was performed as previously described (Swedberg et al. 2011). 23 RNA Secondary Structure Calculation The folding energy of TIPTOP transcripts was calculated using Mfold (Zuker 2003) with default settings. Accession Numbers Sequence data from this work can be found in the GenBank database under the accession numbers KM408418 for M. subangulata TIPTOP4, KM408419 for M. macrophylla TIPTOP5, KM408420 for M. macrophylla TIPTOP6, KM408421 for M. anigosantha TIPRE1, KM408422 for M. anigosantha TIPRE2, KM408423 for M. anigosantha TIPRE3, KM408424 for M. anigosantha TIPRE4, KM408425 for M. friesiorum TIPRE5. Supplementary Materials Supplementary Tables S1, S2, S3, and S4; Supplementary Figs. S1, S2, and S3; and Supplementary Data 1, 2, and 3 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgements This research was supported by a grant from the National Health and Medical Research Council (APP1009267) and the Australian Research Council (LP 130100550). D.J.C. is an NHMRC Professorial Research Fellow (APP1026501). J.S.M. is an ARC Future Fellow (FT120100013). J.S. is an NHMRC Early Career Fellow (APP1069819). T.M. is a recipient of an Endeavour Postgraduate Scholarship granted by the Australian Government. The authors thank E. Miles for comments on the manuscript, E.K. Gilding for helpful discussion, R. Widiatmojo for photo editing of Momordica seeds, and A. Jones of the Molecular and 24 Cellular Proteomics Mass Spectrometry Facility at the Institute for Molecular Bioscience, The University of Queensland, for support and access to the facility. 25 Figure Legends Fig. 1. Proposed classification for currently known backbone cyclized peptides in angiosperms. The cyclic peptide domains (coloured letters) with neighbouring residues are given below the structures. Group 1: Cyclic peptides with three disulfide bonds that form a knot, i.e. Cys III-VI threading a ring formed by Cys I-IV, Cys II-V, and their interconnecting backbone. This structural motif is known as a cyclic cystine knot or CCK. The intracysteine residues that form the loops are labeled Loops 1–6. An example member of the group, i.e. MCoTI-II (Momordica cochinchinensis trypsin inhibitor-II, PDB#1IB9), is shown. Group 2: Cyclic peptides with one disulfide bond, e.g. SFTI-1, (sunflower trypsin inhibitor-1, PDB#1JBL). Cysteine connectivities are shown as yellow lines. Asparaginyl endopeptidase (AEP) target residues, i.e. Asp/Asn, are shown with arrows. Neighbouring residues (P1′ and P2′ sites) important for transpeptidation by AEP are shown with asterisks. Group 3: Cyclic peptides with no disulfide bond, i.e. the orbitides, e.g. segetalin A. Cleavage sites are shown with triangles. The peptide cyclization points is shown with black dots. (For interpretation of the references to color, please refer to the web version of this article.) Fig. 2. Momordica seeds used in this study. The seeds are arranged from left to right, top to bottom, based on their phylogenetic relationship. Siraitia grosvenorii was included as a closely related outgroup species. Fig. 3. Precursors and sequences of Momordica cyclic peptides and their acyclic counterparts. A. Schematic representation of the precursors for TIPTOP1 and TIPTOP5 are shown along with a typical precursor of an acyclic peptide from M. charantia. ER: endoplasmic reticulum signal. LP: leader peptide domain – this naming follows the recommended nomenclature for ribosomally synthesized and post-translationally modified peptides (RiPPs) (Arnison et al. 26 2013). aa: Amino acids. B. Sequence alignment of Momordica cyclic peptides and their acyclic counterparts. Peptides deduced from coding sequences with mass support found in this study (asterisks) are aligned with previously reported peptides (Joubert 1984; Hara et al. 1989; Hamato et al. 1995; Hernandez et al. 2000; Mylne et al. 2012). Residues flanking the cyclization points are shown with black dots. The six conserved cysteine residues are shown with yellow dots. Residues are numbered from the N-terminus to the C-terminus. (For interpretation of the references to color, please refer to the web version of this article.) Fig. 4. Distribution of Momordica cyclic peptides and their acyclic counterparts mapped on a Maximum Likelihood phylogeny estimate of the genus. Species analysed in this study are shown with bold letters. Species previously studied are shown with red letters. Species containing cyclic peptides and their acyclic counterparts are shown with coloured dots, i.e. green for cyclic trypsin inhibitors, orange for acyclic trypsin inhibitors, and red for acyclic elastase inhibitors. Superscript letters following the species names denote currently known information at the nucleotide (n) or peptide (p) level. (For interpretation of the references to color, please refer to the web version of this article.) Fig. 5. Site selection and intermolecular hydrogen bond network mapping of Momordica cyclic peptides. A. Sites under selection: negative (purple), neutral (white), and positive (cyan). The sequence of TI-2 is used as a reference. The intracysteine residues that form the loops are labeled Loop 1–6. Sites are numbered from the proto-N-terminal Gly to the proto-Cterminal Asp. P1 and P1′ respectively denotes residues on the acyl and leaving group side of the peptide bond that would be hydrolysed by trypsin. B. Sites involved in hydrogen bonding with trypsin. TI-1 and TI-2 were used as references for single amino acid substitutions (shown with asterisks) based on the sequences of TI-8, TI-18 and TI-21 for TI-1 and TI-10, TI-20, 27 and TI-22 for TI-2. Intermolecular main chain and side chain hydrogen bond is shown with coloured circles and triangles, respectively. The range of hydrogen bond frequency of occupancy is colour-coded as per legend. (For interpretation of the references to color, please refer to the web version of this article.) Fig. 6. Sequence analysis for Momordica peptides. A. Alignment of the signal peptide sequence of TIPTOP precursors. B. Free folding energy of TIPTOP transcripts calculated using Mfold (Zuker 2003). C. Alignment of domain repeats. Used as an example here is the leader and mature peptide domains of TIPTOP4 from the Asian M. subangulata and TIPRE4 from the African M. anigosantha. The cyclic peptide domains (shown in green, TIPTOP4A– F) appear to be the result of insertion of a four-residue segment trailing the C-terminal and deletion of a segment within the N-terminal of the acyclic peptide domain (shown in orange, TIPTOP4G). TIPRE acyclic peptides acquired an extension of the C-terminal like the TIPTOP cyclic peptides, thus links the evolution of cyclic peptides from their acyclic counterparts. Residues flanking the cyclization points are shown with black dots. The six conserved cysteine residues are shown with yellow dots. Cleavage sites are shown with triangles. D. Unrooted phylogram for TIPTOP genes, TIPRE sequences, and the transcripts from the African M. charantia seeds. E. Mutations at P1 site of the African M. charantia peptides that led to the change from Arg to Leu. (For interpretation of the references to color, please refer to the web version of this article.) 28 References Aboye TL, Ha H, Majumder S, Christ F, Debyser Z, Shekhtman A, Neamati N, Camarero JA. 2012. Design of a novel cyclotide-based CXCR4 antagonist with anti-human immunodeficiency virus (HIV)-1 activity. J Med Chem. 55:10729–10734. Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, Camarero JA, Campopiano DJ, Challis GL, Clardy J, et al. 2013. Ribosomally synthesized and posttranslationally modified peptide natural products: Overview and recommendations for a universal nomenclature. Nat Prod Rep. 30:108–160. Austin J, Wang W, Puttamadappa S, Shekhtman A, Camarero JA. 2009. Biosynthesis and biological screening of a genetically encoded library based on the cyclotide MCoTI-I. Chembiochem 10:2663–2670. Avrutina O, Schmoldt HU, Gabrijelcic-Geiger D, Le Nguyen D, Sommerhoff CP, Diederichsen U, Kolmar H. 2005. Trypsin inhibition by macrocyclic and open-chain variants of the squash inhibitor MCoTI-II. Biol Chem. 386:1301–1306. Bachovchin DA, Cravatt BF. 2012. The pharmacological landscape and therapeutic potential of serine hydrolases. Nat Rev Drug Discov. 11:52–68. Barber CJ, Pujara PT, Reed DW, Chiwocha S, Zhang H, Covello PS. 2013. The two-step biosynthesis of cyclic peptides from linear precursors in a member of the plant family Caryophyllaceae involves cyclization by a serine protease-like enzyme. J Biol Chem. 288:12500–12510. Brady K, Abeles RH. 1990. Inhibition of chymotrypsin by peptidyl trifluoromethyl ketones: Determinants of slow-binding kinetics. Biochemistry 29:7608–7617. Cascales L, Henriques ST, Kerr MC, Huang YH, Sweet MJ, Daly NL, Craik DJ. 2011. Identification and characterization of a new family of cell-penetrating peptides: Cyclic cell-penetrating peptides. J Biol Chem. 286:36932–36943. 29 Chan LY, Gunasekera S, Henriques ST, Worth NF, Le SJ, Clark RJ, Campbell JH, Craik DJ, Daly NL. 2011. Engineering pro-angiogenic peptides using stable, disulfide-rich cyclic scaffolds. Blood 118:6709–6717. Chevreux B. 2005. MIRA: An automated genome and EST assembler. PhD Thesis. Heidelberg: Ruprecht-Karls University. Colgrave ML, Craik DJ. 2004. Thermal, chemical, and enzymatic stability of the cyclotide kalata B1: The importance of the cyclic cystine knot. Biochemistry 43:5965–5975. Contreras J, Elnagar AYO, Hamm-Alvarez SF, Camarero JA. 2011. Cellular uptake of cyclotide MCoTI-I follows multiple endocytic pathways. J Control Release 155:134– 143. Craik DJ. 2013. Joseph Rudinger memorial lecture: Discovery and applications of cyclotides. J Pept Sci. 19:393–407. Craik DJ, Daly NL, Bond T, Waine C. 1999. Plant cyclotides: A unique family of cyclic and knotted proteins that defines the cyclic cystine knot structural motif. J Mol Biol. 294:1327–1336. D’Souza C, Henriques ST, Wang CK, Craik DJ. 2014. Structural parameters modulating the cellular uptake of disulfide-rich cyclic cell-penetrating peptides: MCoTI-II and SFTI-1. Eur J Med Chem. (in press). Daly NL, Love S, Alewood PF, Craik DJ. 1999. Chemical synthesis and folding pathways of large cyclic polypeptide: Studies of the cystine knot polypeptide kalata B1. Biochemistry 38:10606–10614. Daly NL, Thorstholm L, Greenwood KP, King GJ, Rosengren KJ, Heras B, Martin JL, Craik DJ. 2013. Structural insights into the role of the cyclic backbone in a squash trypsin inhibitor. J Biol Chem. 288:36141–36148. 30 Elliott AG, Delay C, Liu H, Phua Z, Rosengren KJ, Benfield AH, Panero JL, Colgrave ML, Jayasena AS, Dunse KM. 2014. Evolutionary Origins of a Bioactive Peptide Buried within Preproalbumin. Plant Cell 26:981–995. Ellis RH. 1991. The longevity of seeds. HortScience 26:1119–1125. Getz JA, Cheneval O, Craik DJ, Daugherty PS. 2013. Design of a cyclotide antagonist of neuropilin-1 and-2 that potently inhibits endothelial cell migration. ACS Chem Biol. 8:1147–1154. Gillon AD, Saska I, Jennings CV, Guarino RF, Craik DJ, Anderson MA. 2008. Biosynthesis of circular proteins in plants. Plant J. 53:505–515. Gracy J, Le-Nguyen D, Gelly JC, Kaas Q, Heitz A, Chiche L. 2008. KNOTTIN: The knottin or inhibitor cystine knot scaffold in 2007. Nucleic Acids Res. 36:D314–319. Greenwood KP, Daly NL, Brown DL, Stow JL, Craik DJ. 2007. The cyclic cystine knot miniprotein MCoTI-II is internalized into cells by macropinocytosis. Int J Biochem Cell Biol. 39:2252–2264. Hamato N, Koshiba T, Pham T-N, Tatsumi Y, Nakamura D, Takano R, Hayashi K, Hong YM, Hara S. 1995. Trypsin and elastase inhibitors from bitter gourd (Momordica charantia LINN.) seeds: Purification, amino acid sequences, and inhibitory activities of four new inhibitors. J Biochem. 117:432–437. Hara S, Makino J, Ikenaka T. 1989. Amino acid sequences and disulfide bridges of serine proteinase inhibitors from bitter gourd (Momordica charantia LINN.) seeds. J Biochem. 105:88–92. Hara-Nishimura I, Inoue K, Nishimura M. 1991. A unique vacuolar processing enzyme responsible for conversion of several proprotein precursors into the mature forms. FEBS Lett. 294:89–93. Hedstrom L. 2002. Serine protease mechanism and specificity. Chem Rev. 102:4501–4524. 31 Hernandez JF, Gagnon J, Chiche L, Nguyen TM, Andrieu JP, Heitz A, Hong TT, Pham TTC, Nguyen DL. 2000. Squash trypsin inhibitors from Momordica cochinchinensis exhibit an atypical macrocyclic structure. Biochemistry 39:5722–5730. Higaki JN, Gibson BW, Craik CS. 1987. Evolution of catalysis in the serine proteases. Cold Spring Harb Symp Quant Biol. 52:615–621. Hiraiwa N, Nishimura M, Hara-Nishimura I. 1999. Vacuolar processing enzyme is selfcatalytically activated by sequential removal of the C-terminal and N-terminal propeptides. FEBS Lett. 447:213–216. Ji Y, Majumder S, Millard M, Borra R, Bi T, Elnagar AY, Neamati N, Shekhtman A, Camarero J. 2013. In vivo activation of the p53 tumor suppressor pathway by an engineered cyclotide. J. Am. Chem. Soc. 135:11623–11633. Joubert FJ. 1984. Trypsin isoinhibitors from Momordica repens seeds. Phytochemistry 23:1401–1406. Kaas Q, Craik DJ. 2010. Analysis and classification of circular proteins in Cybase. Pept Sci. 94:584–591. Kocyan A, Zhang L-B, Schaefer H, Renner SS. 2007. A multi-locus chloroplast phylogeny for the Cucurbitaceae and its implications for character evolution and classification. Mol Phylogenet Evol. 44:553–577. Kosakovsky Pond SL, Frost SDW. 2005. Not so different after all: A comparison of methods for detecting amino acid sites under selection. Mol Biol Evol. 22:1208–1222. Kosakovsky Pond SL, Frost SDW, Muse SV. 2005. HyPhy: Hypothesis testing using phylogenies. Bioinformatics 21:676–679. Krieger M, Kay LM, Stroud R. 1974. Structure and specific binding of trypsin: Comparison of inhibited derivatives and a model for substrate binding. J Mol Biol. 83:209–230. 32 Kurth T, Ullmann D, Jakubke H-D, Hedstrom L. 1997. Converting trypsin to chymotrypsin: Structural determinants of S1' specificity. Biochemistry 36:10098–10104. Li F, Zheng Q, Vandivier LE, Willmann MR, Chen Y, Gregory BD. 2012. Regulatory impact of RNA secondary structure across the Arabidopsis transcriptome. Plant Cell 24:4346– 4359. Li Y-D, Xie Z-Y, Du Y-L, Zhou Z, Mao X-M, Lv L-X, Li Y-Q. 2009. The rapid evolution of signal peptides is mainly caused by relaxed selection on non-synonymous and synonymous sites. Gene 436:8–11. Ling M-H, Qi H-y, Chi C-w. 1993. Protein, cDNA, and genomic DNA sequences of the towel gourd trypsin inhibitor. A squash family inhibitor. J Biol Chem. 268:810–814. Lu W, Qasim M, Laskowski M, Kent SB. 1997. Probing intermolecular main chain hydrogen bonding in serine proteinase-protein inhibitor complexes: Chemical synthesis of backbone-engineered turkey ovomucoid third domain. Biochemistry 36:673–679. Luckett S, Garcia RS, Barker J, Konarev AV, Shewry P, Clarke A, Brady R. 1999. Highresolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds. J Mol Biol. 290:525–533. Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes. Science 290:1151–1155. Mahatmanto T, Poth AG, Mylne JS, Craik DJ. 2014. A comparative study of extraction methods reveals preferred solvents for cystine knot peptide isolation from Momordica cochinchinensis seeds. Fitoterapia 95:22–33. Muse SV, Gaut BS. 1994. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol. 11:715–724. 33 Mylne JS, Colgrave ML, Daly NL, Chanson AH, Elliott AG, McCallum EJ, Jones A, Craik DJ. 2011. Albumins and their processing machinery are hijacked for cyclic peptides in sunflower. Nat Chem Biol. 7:257–259. Mylne JS, Chan LY, Chanson AH, Daly NL, Schaefer H, Bailey TL, Nguyencong P, Cascales L, Craik DJ. 2012. Cyclic peptides arising by evolutionary parallelism via asparaginylendopeptidase-mediated biosynthesis. Plant Cell 24:2765–2778. Nei M, Kumar S. 2000. Molecular evolution and phylogenetics: Oxford University Press. Nguyen GKT, Zhang S, Wang W, Wong CTT, Nguyen NTK, Tam JP. 2011. Discovery of a linear cyclotide from the bracelet subfamily and its disulfide mapping by top-down mass spectrometry. J Biol Chem. 286:44833–44844. Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: Discriminating signal peptides from transmembrane regions. Nat Methods 8:785–786. Poth AG, Colgrave ML, Lyons RE, Daly NL, Craik DJ. 2011. Discovery of an unusual biosynthetic origin for circular proteins in legumes. Proc Natl Acad Sci. 108:10127– 10132. Poth AG, Chan LY, Craik DJ. 2013. Cyclotides as grafting frameworks for protein engineering and drug design applications. Pept Sci. 100:480–491. Rypniewski WR, Perrakis A, Vorgias CE, Wilson KS. 1994. Evolutionary divergence and conservation of trypsin. Protein Eng. 7:57–64. Saska I, Gillon AD, Hatsugai N, Dietzgen RG, Hara-Nishimura I, Anderson MA, Craik DJ. 2007. An asparaginyl endopeptidase mediates in vivo protein backbone cyclization. J Biol Chem. 282:29721–29728. Schaefer H, Heibl C, Renner SS. 2009. Gourds afloat: A dated phylogeny reveals an Asian origin of the gourd family (Cucurbitaceae) and numerous oversea dispersal events. Proc R Soc B. 276:843–851. 34 Schaefer H, Renner SS. 2010. A three-genome phylogeny of Momordica (Cucurbitaceae) suggests seven returns from dioecy to monoecy and recent long-distance dispersal to Asia. Mol Phylogenet Evol. 54:553–560. Shewry PR, Casey R. 1999. Seed proteins. Dordrecht: Springer. Silverman IM, Li F, Gregory BD. 2013. Genomic era analyses of RNA secondary structure and RNA-binding proteins reveal their significance to post-transcriptional regulation in plants. Plant Sci. 205:55–62. Sommerhoff CP, Avrutina O, Schmoldt HU, Gabrijelcic-Geiger D, Diederichsen U, Kolmar H. 2010. Engineered cystine knot miniproteins as potent inhibitors of human mast cell tryptase β. J Mol Biol. 395:167–175. Staal J, Dixelius C. 2007. Tracing the ancient origins of plant innate immunity. Trends Plant Sci. 12:334–342. Suzuki Y, Gojobori T. 1999. A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 16:1315–1328. Swedberg JE, de Veer SJ, Sit KC, Reboul CF, Buckle AM, Harris JM. 2011. Mastering the canonical loop of serine protease inhibitors: Enhancing potency by optimising the internal hydrogen bond network. PLoS ONE 6: e19302. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 28:2731–2739. Tan N-H, Zhou J. 2006. Plant cyclopeptides. Chem Rev. 106:840–895. Thompson RC, Blout ER. 1973. Dependence of the kinetic parameters for elastase-catalyzed amide hydrolysis on the length of peptide substrates. Biochemistry 12:57–65. 35 Thongyoo P, Bonomelli C, Leatherbarrow RJ, Tate EW. 2009. Potent inhibitors of β-tryptase and human leukocyte elastase based on the MCoTI-II scaffold. J Med Chem. 52:6197– 6200. Tokuriki N, Tawfik DS. 2009. Stability effects of mutations and protein evolvability. Curr Opin Struct Biol. 19:596–604. Trabi M, Craik DJ. 2002. Circular proteins—no end in sight. Trends Biochem Sci. 27:132– 138. Yang P, Li X, Shipp MJ, Shockey JM, Cahoon EB. 2010. Mining the bitter melon (Momordica charantia L.) seed transcriptome by 454 analysis of non-normalized and normalized cDNA populations for conjugated fatty acid metabolism-related genes. BMC Plant Biol. 10:250. Zuker M. 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31:3406–3415. 36 Group 1 MCoTI-II P Loop 5 RGNGYCG VI ICV SDGGV S GLoop 6 I C * Loop 2 4 Cleavage sites Cyclization points K AEP target residues Residues important for transpeptidation by AEP op 1 Lo LKKCR L KI II R Loop 3 p 2 CPGA C L oo S D III D IV Loop 5 II V Loop 4 VI III IV Loop 3 Loop 1 Loop 6 I AEP ...DINGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALE... K K * ** Group 3 Group 2 SFTI-I Segetalin A ICF PV G PD R V AEP PCY1 AG ...EDNGRCTKSIPPICFPDGLD... K * W CTK SIPP K ** ...KPQGVPVWAFQA... Fig. 1. 37 Fig. 2. 38 Acycl Cycl Cycl Cycl Cycl ic pe ide pt pep ic de ti pep ic de ti pep ic de ti pep ic de ti A TIPTOP1 ER LP TI-1 LP TI-2 LP TI-2 LP TI-2 LP TI-5 281 aa TIPTOP5 ER LP TI-18 LP TI-2 LP TI-2 LP TI-4 LP TI-19 283 aa MCTI-I 90 aa Typical acyclic peptide precursor B Cyclic peptide sequence alignment TI-1 TI-2 TI-4 TI-7 TI-8 TI-9 TI-10 TI-11 TI-12 TI-13 TI-14 TI-15 TI-16 TI-17 TI-18 TI-20 TI-21 TI-22 GGVCPKILQRCRRDSDCPGACICRGNGYCGSGSD GGVCPKILKKCRRDSDCPGACICRGNGYCGSGSD GGACPRILKKCRRDSDCPGACVCQGNGYCGSGSD GGACPRILKKCRRDSDCPGACVCKGNGYCGSGSD GGVCPKILQRCRRDSDCPGACICLGNGYCGSGSD GGICPKILQRCRRDSDCPGACICRGNGYC--GSD GGVCPKILKKCRRDSDCPGACICRGNGYCSSGSD GGVCPKILKKCRHDSDCPGACICRGNEYCGSGSD GGACPRILKKCRRDSDCPGACICRGNGYCGSGSD GGACPKILQRCRRDSDCPSACICRGNGYCGSGSD * GGACPKILQKCRRDSDCPGACVCQGNGYCGSGSD * GGACPRILKQCRRDSDCPGACVCQGNGYCGSGSD * GGACPRILKQCRRDSDCPGACICQGNGYCGSGSD * GGACPRILKKCRRDSDCPGACVCRGNGYCGSGSD * GGICPKILQRCRRDSDCPGACICRGNGYCGSGSD * GGVCPKILKKCRHDSDCPGACICRGNGYCGSGSD * GGVCPKILQRCRRDSDCPGACICQGNGYCGSGSD * GGVCPRILKKCRRDSDCPGACICRGNGYCGSGSD 5 10 15 20 Residues flanking the cyclization points ER LP Acyclic peptide sequence alignment 25 30 * TI-3 TI-5 TI-6 TI-19 TI-24 TI-25 TI-26 TI-27 CM-1 CM-3 MCTI-A MCTI-I MCTI-III MCTI-II TI-28 MCEI-I MCEI-II MCEI-III MCEI-IV EI-1 --ERACPRILKKCRRDSDCPGECICKENGYCG-----QRACPRILKKCRRDSDCPGECICKGNGYCG-----QRACPRILKKCRRDSDCPGECICQGNGYCG----QQRACPRILKKCRRDSDCPGECICKGNGYCG---- * --QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGD * --QRACPRILKRCSRDSDCPGACVCQDNGYCGSRGD * --QRACPRILKRCRRDSDCPGACVCQGNGYCGSRGD * --QRACPRILKRCRRDSDCPGACVCQDNGYCGSGGD * ---GICPRILMECKRDSDCLAQCVCKRQGYCG------AICPRILVECKRDSDCPAQCICKRQGYCG------RSCPRIWMECTRDSDCMAKCICV-AGHCG-----ERRCPRILKQCKRDSDCPGECICMAHGFCG-----ERGCPRILKQCKQDSDCPGECICMAHGFCG------RICPRIWMECKRDSDCMAQCICV-DGHCG---EEERICPRIWMECKRDSDCMAQCICV-DGHCG---* ---RICPLIWMECKRDSDCLAQCICV-DGHCG-----ERICPLIWMECKRDSDCLAQCICV-DGHCG----EERICPLIWMECKRDSDCLAQCICV-DGHCG---EEERICPLIWMECKRDSDCLAQCICV-DGHCG---EDERICPLIWMECKRDSDCLAQCICV-DGHCG---5 Conserved cysteine residues * Peptides deduced from coding sequences with mass support found in this study Fig. 3. 39 10 15 20 25 30 * M. cochinchinensis n,p M. sphaeroidea n Cyclic trypsin inhibitors Acyclic trypsin inhibitors M. suringarii M. macrophylla n,p M. denticulata p M. dioica M. denudata M. subangulata n,p M. renigera M. laotica Acyclic elastase inhibitors n Information known at the nucleotide level p Information known at the peptide level M. clarkeana p M. enneaphylla M. cissoides p M. gilgiana p M. anigosantha n,p M. friesiorum n,p M. nuda M. pterocarpa p M. repens M. corymbifera M. spinosa M. parvifolia p M. multiflora M. silvatica p M. glabra M. jeffreyana p M. camerounensis p M. trifoliolata p M. rostrata p M. littorea M. cardiospermoides M. dissecta M. peteri M. sessilifolia M. kirkii M. humilis p M. cymbalaria p p M. boivinii M. henriquesii M. angustisepala M. leiocarpa p M. foetida p M. involucrata M. balsamina p M. welwitschii M. angolensis M. charantia n,p M. cabrae p M. calantha p M. obtusisepala Siraitia grosvenorii p 0.0030 Fig. 4. 40 P1 A 5 1 G G Loop 6 S D Selection Type K Loop 1 I P C L Y N G S G C 30 Loop 4 C 20 A I G G G 25 10 H-Bonding Frequency C CC 86.0–99.9% 72.0–85.9% 58.0–71.9% 44.0–57.9% 30.0–43.9% 16.0–29.9% 1.0–15.9% D R P Loop 3 K K Loop 5 R V Negative Selection Neutral Positive Selection 3ƍ D Loop 2 R S 15 B TI-1 TI-2 K I P G G S D C C G C R V A Y N G S G I K L G P G G D S D C S G I C CC D R P G R D S TI-10 K I P G G S D C S G C A Y N G C Q V I K L G G Q P G G D S D C S G A R I C CC D R P G R D S TI-20 K I P G G S D C S G C A Y N G C L V I K L G G Q P G G D S D C S G I C CC D R P G H D S TI-22 K I P G G S D C G S G L I C * A G G C I G P D R S R * P R G C CC D Q G S D * R L Y N K K G S TI-18 G C R C A L Y N G V R R D I P R C * CC G K K G S TI-8 G C * C L Y N S V R R D I P R C * CC G K K G S TI-21 G C R C A L Y N G V R R D I P R C CC G Q C I S G C G G C R V A L Y N G I G C CC D R P D K K H S Fig. 5. 41 Number of Domain Repeats B A MESKKILPVVLVAMMLVATSTG MESKKILPVVLVAMMLVATSTG MESKKILPVVLVAMMLVATSTG MESKKILPVVLVAMMLVATSTG MESKKILLVVLVAMMLVATSTC MESKKILPVVLVAMMLIATSTG MESKKILPVVLVAMMLVATSTG Mco: M. cochinchinensis Msp: M. sphaeroidea C (4A) ER TIPTOP 8 McTIPTOP2 TIPTOP6 MspTIPTOP2 -400 TIPTOP4 -450 McTIPTOP3 -500 (4G) CyP LP 7 TIPTOP5 -350 x n repeats (4B–F) CyP LP 6 McTIPTOP1 -300 Free Folding Energy (kcal/mol) McoTIPTOP1 McoTIPTOP2 McoTIPTOP3 MspTIPTOP2 TIPTOP4 TIPTOP5 TIPTOP6 5 LP AcyP TIPTOP4A TIPTOP4B TIPTOP4C TIPTOP4D TIPTOP4E TIPTOP4F FNDGDTTDLISDGRAQM--DINGG--------ACPKILQRCRRDSDCPSACICRGNGYCGSGSD -----AIDLISDSRAQI--DINGG--------ACPKILQKCRRDSDCPGACVCQGNGYCGSGSD -----ALDLMSDGRAQI--DINGG--------ACPRILKQCRRDSDCPGACVCQGNGYCGSGSD -----ALDLMSDGRAQI--GINGG--------ACPRILKQCRRDSDCPGACICQGNGYCGSGSD -----ALDLMSDGRAQI--DINGG--------ACPRILKKCRRDSDCPGACVCRGNGYCGSGSD -----ALDLMSDGRAQI--DINGG--------ACPRILKKCRRDSDCPGACVCQGNGYCGSGSD TIPTOP4G -----MIDLISDGGAQTGEDINGGGVYDKRQRACPRILKKCRRDSDCPGECICQGNGYCG---- TIPRE4A TIPRE4B TIPRE4C TIPRE4D FNDGDTIDLISNDRAQTGQDINGGGVYSEEQRACPRILKRCRRDSDCPGACVCQGNGYCGSRGD -----MIDIVLDGRAQTGEDINGSGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGSGGD -----MIDVILDNRAQTGQDINGGGVYSEEQRACPKILKRCRRDSDCPGACVCQDNGYCGSGGD -----MIDIVLDGRAQTGEDINGSGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGSGGD ER TIPRE AcyP LP (4A) AcyP LP Residues flanking the cyclization points Conserved cysteine residues Cleavage sites x n repeats (4B–D) MCTI-III TI-28 D M CT I-I EI-1 TI PR E1 TIPRE5 TI-23 E P1 Site TIPRE2 TIPRE4 MCTI-I/III TI-28 EI-1 E3 P1 TI PR O PT P4 P6 O PT TI P2 TO IP pT Ms TIPTOP5 TO TIP I cT M McTIPTOP2 Mc TIP TO P3 Fig. 6. 42 AGA CGA CTA Arginine Leucine Supplementary Tables and Figures Table S1. New TIPTOP and TIPRE Precursors Found in Momordica Seeds. Species Precursor Peptide Sequence MESKKILLVVLVAMMLVATSTCFNDGDTTDLISDGRAQMDINGGA M. subangulata TIPTOP4 TI-13 CPKILQRCRRDSDCPSACICRGNGYCGSGSDAIDLISDSRAQIDI (full) TI-14 NGGACPKILQKCRRDSDCPGACVCQGNGYCGSGSDALDLMSDGRA TI-15 QIDINGGACPRILKQCRRDSDCPGACVCQGNGYCGSGSDALDLMS TI-16 DGRAQIGINGGACPRILKQCRRDSDCPGACICQGNGYCGSGSDAL TI-17 DLMSDGRAQIDINGGACPRILKKCRRDSDCPGACVCRGNGYCGSG TI-4 SDALDLMSDGRAQIDINGGACPRILKKCRRDSDCPGACVCQGNGY TI-6 CGSGSDMIDLISDGGAQTGEDINGGGVYDKRQRACPRILKKCRRD SDCPGECICQGNGYCG M. macrophylla M. anigosantha M. friesiorum TIPTOP5 (full) TI-18 TI-2 TI-2 TI-4 TI-19 MESKKILPVVLVAMMLIATSTGFNDGDTIDLISDGRAQIDINGGI CPKILQRCRRDSDCPGACICRGNGYCGSGSDALEGLMSDGRAQID INGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALEGLMSDG RAQIDINGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALEG LMSDGRAQIDINGGACPRILKKCRRDSDCPGACVCQGNGYCGSGS DALEGLMSDAGAQTGEDINGGGVYDEKQQRACPRILKKCRRDSDC PGECICKGNGYCG TIPTOP6 (full) TI-18 TI-2 TI-20 TI-21 TI-22 TI-5 MESKKILPVVLVAMMLVATSTGFNDGDTIDLISDGRAQIDINGGI CPKILQRCRRDSDCPGACICRGNGYCGSGSDALEGLMSDGRAQID INGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALEGLMSDG RAQIDINGGVCPKILKKCRHDSDCPGACICRGNGYCGSGSDALEG LMSDGRAQIDINGGVCPKILQRCRRDSDCPGACICQGNGYCGSGS DALEGLVSDGRAQIDINGGVCPRILKKCRRDSDCPGACICRGNGY CGSGSDALEGLMSDAGAQTGEDINGGGVYDEKQRACPRILKKCRR DSDCPGECICKGNGYCG TIPRE1 (partial) TI-24 AAALVAMMLVATSADFNGGDTIHLISNGRAQTGQDINSGGVYSEE QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGDIIDVVFDDRAQ ASQDINGGGVYFEE TIPRE2 (partial) TI-24 TI-25 AAALVAMMLVATSADFNGGDTIHLISNGRAQTGQDINSGGVYSEE QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGDIIDVVFDDRAQ ASQDINGGGVYFEEQRACPRILKRCSRDSDCPGACVCQDNGYCGS RGDMIDIVLEGRAQTGQDINDGGVYSEE TIPRE3 (partial) TI-26 TI-27 TI-24 AAVLVATMLVATSADFNDGDTIDLISNDRAQTGQDINGGGVYSEE QRACPRILKRCRRDSDCPGACVCQGNGYCGSRGDMIDIVLDGRAQ IGEDINGGGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGS GGDMIDVVLDSRAQTGQDINGGGVYSEEQRACPKILKRCRRDSDC PGACVCQDNGYCGSGGDMIDIVLDGRAQTGEDINGGGVYSEE TIPRE4 (partial) TI-26 TI-27 TI-24 TI-27 AAVLVATMLVATSADFNDGDTIDLISNDRAQTGQDINGGGVYSEE QRACPRILKRCRRDSDCPGACVCQGNGYCGSRGDMIDIVLDGRAQ TGEDINGSGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGS GGDMIDVILDNRAQTGQDINGGGVYSEEQRACPKILKRCRRDSDC PGACVCQDNGYCGSGGDMIDIVLDGRAQTGEDINGSGVYSEEQRA CPRILKRCRRDSDCPGACVCQDNGYCGSGGDMIDVILDNRAQTGQ DISGGGVYSEE TIPRE5 (partial) TI-24 AAALVAMMLVATSADSNGGDTIHLISNGRAQTGQDINSGGVYSEE QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGDIIDVVFDDRAQ ASQDINGGGVYFEE Colour coding: Endoplasmic reticulum (ER) signal sequence is shown in cyan, leader peptide (LP) domain is shown in black, cyclic peptide domain is shown in green, and acyclic peptide domain is shown in orange. (For interpretation of the references to color, please refer to the web version of this article.) Table S2. Precursors Assembled from Transcriptome Sequencing Data of M. charantia Seeds. Transcript ID number Peptide Precursor Sequence MESKKIFIVVALVAMMLVASSATFEEGDMRPLVSDDGAVAGQDMNDF 1315_rep_c7201 MCTI-I PRKMFVKVVYYENQRRCPRILKQCKRDSDCPGECICMAHGFCG MESKKVVVVVAMVVMMLVAMSSAAFDDGGAETGEVNYYPRKMFIKIG 1315_rep_c3939 TI-28 VYNEEERICPRIWMECKRDSDCMAQCICVDGHCG MESKKIVVVVALVAMMLVATSAAFDEGDTRPTRPLVSDDGAVVGQGM 1315_rep_c7780 MCTI-III NDYPRKMFVKVVYYENQRGCPRILKQCKQDSDCPGECICMAHGFCG MESKKVVVVVAMVVMMLVATSSAAFNDGRAETGEVNYPRKMFIKIGV 1315_rep_c6383 EI-1 YNEDERICPLIWMECKRDSDCLAQCICVDGHCG MEWKKFALVAIVGMLLMGASAQAGGAETVATEIQGRPRRMMRGGICP 1315_rep_c4362 TI-23 RILMKCKKTSDCMAQCKCLSNGFCGSAPN Colour coding: Endoplasmic reticulum (ER) signal sequence is shown in cyan, leader peptide (LP) domain is shown in black and acyclic peptide domain is shown in orange. The sites of N-terminal trimming are highlighted in red. No evidence was found to support TI-23, which is shown in gray. (For interpretation of the references to color, please refer to the web version of this article.) Table S3. Sequences of Representative Diagnostic Peptides Identified in Momordica Seeds (Alphabetic Order) Precursor Mass (Da) Diagnostic Peptide Sequence Digestion Species Deduced by Tandem MSa Enzymeb Theoretical Observed Ion <QRACPRIL M. anigosantha 995.53 995.54 498.78 C NGYCGSRGD 984.37 984.36 493.19 E EDERICPL M. balsamina 1030.48 1030.47 516.24 C RICPRIW 999.54 999.54 500.78 C <QRACPRIL M. boivinii 995.53 995.54 498.78 C M. cabrae <QRACPRIL 995.53 995.54 498.78 C M. calantha <QRACPKIL 984.55 984.59 493.30 C M. camerounensis GACPRILKK 1041.61 1041.53 521.77 T M. charantia EEERICPRIW 1386.67 1386.72 694.37 C EDERICPL 1030.48 1030.49 516.25 C DSDCPGACVCR M. cissoides 1295.47 1295.49 648.75 T CNSGSDGGACPKIL M. clarkeana 1462.63 1462.64 732.33 C <QRACPRIL 995.53 995.55 498.78 C M. cochinchinensis CNSGSDGGVCPKIL 1462.65 1462.64 732.33 C <ERVCPKILQE M. cymbalaria 1252.66 1252.70 418.57 E CGSGSDGGVCPKIL M. denticulata 1405.63 1405.65 703.83 C <QRACPRIL 995.53 995.55 498.78 C M. foetida <ERGCPRIL 981.52 981.53 491.77 C RICPLIWQECKR 1689.84 1689.86 564.29 T M. friesiorum <QRACPKIL 967.53 967.51 484.76 C M. gilgiana CGSGKDGGACPKIL 1418.66 1418.66 710.34 C <ERICPRIWME 1370.66 1370.70 686.36 E <QQRACPR M. humilis 915.43 915.45 458.73 T <ERVCPRIL M. jeffreyana 1023.56 1023.58 512.80 C <ERGCPRIL M. leiocarpa 981.52 981.54 491.78 C RICPLIWMDCKR 1662.82 1662.84 555.29 T CGSGSDGGICPKIL M. macrophylla 1419.65 1419.65 710.83 C <QQRACPRIL 1123.59 1123.60 562.80 C M. parvifolia <QRACPKIL 967.53 967.53 484.77 C M. rostrata <ERGCPRIL 981.52 981.52 491.77 C M. silvatica <ERRCPRIL 1080.60 1080.60 541.31 C M. subangulata CGSGSDGGACPKIL 1377.60 1377.59 689.80 C <QRACPRIL 995.53 995.52 498.77 C <ERGCPRIL M. trifoliolata 981.52 981.51 491.76 C GRVCPRIL S. grosvenorii 969.55 969.54 485.78 C a Amino acid modifications include pyrolated glutamine/glutamate (<Q/E), carbamidomethylated cysteine (C), dioxidized tryptophan (W), deamidated (Q), and oxidized methionine (M). b Enzymes used for digestion are trypsin (T), chymotrypsin (C), or endoproteinase Glu-C (E). Peptide cleavage points are shown with arrows (). (For interpretation of the references to color, please refer to the web version of this article.) Table S4. Sites of Momordica Cyclic Peptides That Are Under Selection Syn Nonsyn Syn Nonsyn Norm. No. Codon dS dN dN-dS P-value (s) (n) sites (S) sites (N) dN-dS GGT 1 1.000 0.000 1.000 2.000 1.000 0.000 -1.000 1.000 -2.154 GGT 2 0.000 0.000 1.000 2.000 0.000 0.000 0.000 N/A 0.000 GTC 3 2.000 4.000 0.994 2.006 2.012 1.994 -0.018 0.684 -0.039 TGT 4 0.000 0.000 0.706 2.197 0.000 0.000 0.000 N/A 0.000 CCC 5 0.000 0.000 1.000 2.000 0.000 0.000 0.000 N/A 0.000 AAA 6 0.000 1.000 0.793 2.136 0.000 0.468 0.468 0.729 1.008 ATC 7 0.000 0.000 0.955 2.045 0.000 0.000 0.000 N/A 0.000 TTG 8 0.000 0.000 1.505 1.398 0.000 0.000 0.000 N/A 0.000 CAG 9 0.000 1.000 0.799 1.867 0.000 0.535 0.535 0.700 1.153 AGA 10 0.000 4.000 0.796 2.081 0.000 1.922 1.922 0.274 4.139 TGC 11 1.000 0.000 0.670 2.050 1.493 0.000 -1.493 1.000 -3.215 AGG 12 2.000 0.000 1.005 1.982 1.989 0.000 -1.989 1.000 -4.284 CGC 13 1.000 1.000 0.993 2.007 1.007 0.498 -0.509 0.890 -1.096 GAC 14 0.000 0.000 0.669 2.331 0.000 0.000 0.000 N/A 0.000 TCC 15 3.000 0.000 1.000 2.000 3.000 0.000 -3.000 1.000 -6.461 GAT 16 1.000 0.000 0.705 2.295 1.419 0.000 -1.419 1.000 -3.056 TGC 17 0.000 0.000 0.669 2.045 0.000 0.000 0.000 N/A 0.000 CCC 18 1.000 0.000 1.000 2.000 1.000 0.000 -1.000 1.000 -2.154 GGT 19 0.000 1.000 0.990 2.010 0.000 0.498 0.498 0.670 1.072 GCA 20 0.000 0.000 1.000 2.000 0.000 0.000 0.000 N/A 0.000 TGT 21 0.000 0.000 0.706 2.197 0.000 0.000 0.000 N/A 0.000 ATT 22 0.000 3.000 0.848 2.152 0.000 1.394 1.394 0.369 3.002 TGC 23 0.000 0.000 0.669 2.045 0.000 0.000 0.000 N/A 0.000 CGG 24 0.000 5.000 1.164 1.676 0.000 2.984 2.984 0.072 6.426 GGG 25 4.000 0.000 1.000 1.967 4.000 0.000 -4.000 1.000 -8.614 AAC 26 0.000 0.000 0.669 2.331 0.000 0.000 0.000 N/A 0.000 GGG 27 0.000 1.000 0.998 2.000 0.000 0.500 0.500 0.667 1.077 TAT 28 0.000 0.000 0.706 2.000 0.000 0.000 0.000 N/A 0.000 TGC 29 1.000 0.000 0.672 2.057 1.489 0.000 -1.489 1.000 -3.207 GGT 30 0.000 3.000 0.980 2.018 0.000 1.487 1.487 0.305 3.202 AGC 31 1.000 2.000 0.674 2.322 1.483 0.861 -0.622 0.871 -1.339 GGT 32 1.000 0.000 1.000 1.996 1.000 0.000 -1.000 1.000 -2.154 AGC 33 0.000 0.000 0.669 2.331 0.000 0.000 0.000 N/A 0.000 GAC 34 1.000 0.000 0.670 2.330 1.493 0.000 -1.493 1.000 -3.215 Notes: s is the number of synonymous substitution. n is the number of nonsynonymous substitution. S is the number of synonymous site. N is the number of nonsynonymous site. dS is the number synonymous substitutions per site (s/S). dN is the number of nonsynonymous substitutions per site (n/N). dN-dS indicates the type of selection operating on a codon, i.e. (-) value for negative and (+) value for positive selection. P-value is the probability of rejecting the null hypothesis of neutral evolution. Norm. dN-dS is the normalized value of dN-dS by taking into account the total number of substitutions in the tree. Table S5. Inter- and Intramolecular Hydrogen Bond Network of Selected Momordica Cyclic Peptides in Complex with Trypsin. Main chain hydrogen bond Side chain hydrogen bond Cyclic Frequency of Frequency of Peptide Occupancy Occupancy Donor Acceptor Donor Acceptor (%) (%) TI-1 Gly195-Main Lys6-Main 90.50 Lys6-Side Ser192-Main 90.69 Lys6-Main Ser212-Main 90.09 Lys6-Side Ser192-Side 44.31 Leu8-Main Phe44-Main 84.71 Lys63-Side Gln9-Side 29.17 Ser215-Side Gly2-Main 73.01 Ser31-Side Ser215-Main 17.70 Gly214-Main Cys4-Main 73.05 Asn26-Side Thr149-Side 15.64 Cys4-Main Gly214-Main 64.77 Lys6-Side Asp191-Side 12.81 Gln194-Side Pro5-Main 61.21 Arg24-Side Ser147-Main 8.45 Ser197-Main Lys6-Main 40.22 Lys6-Side Gly216-Main 5.64 Gln175-Side Gly1-Main 29.03 Ser31-Side Gln194-Side 4.26 Lys222-Side Asp34-Main 6.46 Arg24-Side Thr149-Side 3.63 Gln194-Side Cys29-Main 6.11 Thr149-Side Asn26-Side 2.75 Ser215-Side Gly1-Main 6.01 Asn26-Side Tyr151-Side 2.25 Gln219-Side Gly32-Main 5.37 Gln194-Side Ser31-Side 1.86 Gln194-Side Ile7-Main 4.64 Gly216-Main Ser31-Side 1.31 Gly32-Main Ser215-Side 2.45 Cys4-Main Ser215-Side 1.16 Ser215-Side Asp34-Main 1.04 TI-21 Cys4-Main Gly214-Main 96.29 Lys6-Side Ser192-Main 90.34 Lys6-Main Ser212-Main 94.45 Lys63-Side Gln9-Side 65.21 Gly195-Main Lys6-Main 92.42 Lys6-Side Asp191-Side 29.33 Leu8-Main Phe44-Main 68.13 Lys6-Side Ser192-Side 24.49 Gln194-Side Pro5-Main 59.85 Thr149-Side Gln24-Side 23.21 Gly214-Main Cys4-Main 50.31 Lys6-Side Gly216-Main 18.31 Ser197-Main Lys6-Main 42.37 Asn26-Side Tyr151-Side 9.85 Arg10-Main Tyr42-Side 31.39 Gln24-Side Thr149-Side 8.99 Ser215-Side Gly2-Main 25.43 Asn26-Side Thr149-Side 3.54 Gln194-Side Ile7-Main 18.73 Gln24-Side Tyr151-Side 1.28 Ser215-Side Gly1-Main 13.26 Gln9-Side Tyr42-Side 1.17 Gln194-Side Cys29-Main 9.11 Gln175-Side Gly1-Main 1.01 TI-8 Lys6-Main Ser212-Main 95.97 Lys6-Side Ser192-Main 91.59 Gly195-Main Lys6-Main 91.15 Lys6-Side Ser192-Side 26.73 Cys4-Main Gly214-Main 90.53 Asn26-Side Tyr151-Side 26.59 Gln194-Side Pro5-Main 88.10 Gly216-Main Ser31-Side 18.61 Ser215-Side Gly2-Main 85.80 Lys6-Side Gly216-Main 13.63 Leu8-Main Phe44-Main 65.05 Tyr42-Side Gln9-Side 9.12 Gly214-Main Cys4-Main 60.39 Lys6-Side Asp191-Side 7.61 Ser197-Main Lys6-Main 45.84 Asn26-Side Thr149-Side 6.39 Gln194-Side Ile7-Main 15.33 Thr149-Side Asn26-Side 3.48 Lys222-Side Asp34-Main 9.50 Ser31-Side Gly216-Main 1.79 Gln194-Side Cys29-Main 6.07 Gln219-Side Asp34-Main 2.87 Lys222-Side Ser33-Main 2.63 Gln175-Side Gly2-Main 1.52 Gln219-Side Gly32-Main 1.42 Table S5. Continued TI-18 Lys6-Main Gly195-Main Ser215-Side Cys4-Main Gln194-Side Gly214-Main Leu8-Main Ser197-Main Gln194-Side Arg10-Main Tyr42-Side Gln194-Side Gln175-Side Ser215-Side TI-2 Lys6-Main Gly195-Main Gln194-Side Cys4-Main Leu8-Main Gly214-Main Lys10-Main Ser215-Side Gln194-Side Ser197-Main Gln175-Side Ser215-Side Gln194-Side Gln219-Side TI-10 Gly195-Main Lys6-Main Cys4-Main Gln194-Side Leu8-Main Lys10-Main Gln194-Side Ser215-Side Ser197-Main Gly214-Main Ser215-Side TI-20 Lys6-Main Gly214-Main Gly195-Main Gln194-Side Cys4-Main Lys10-Main Leu8-Main Gln175-Side Ser197-Main Ser215-Side Gln194-Side Ser215-Side Gln194-Side Gly216-Main Ser215-Side Gly2-Main Ser212-Main Lys6-Main Gly2-Main Gly214-Main Pro5-Main Cys4-Main Phe44-Main Lys6-Main Ile7-Main Tyr42-Side Leu8-Main Cys29-Main Gly1-Main Gly1-Main Ser212-Main Lys6-Main Pro5-Main Gly214-Main Phe44-Main Cys4-Main Tyr42-Side Gly2-Main Ile7-Main Lys6-Main Gly1-Main Gly1-Main Cys29-Main Asp34-Main Lys6-Main Ser212-Main Gly214-Main Pro5-Main Phe44-Main Tyr42-Side Ile7-Main Gly2-Main Lys6-Main Cys4-Main Gly1-Main Ser212-Main Cys4-Main Lys6-Main Pro5-Main Gly214-Main Tyr42-Side Phe44-Main Gly1-Main Lys6-Main Gly2-Main Ile7-Main Asp34-Main Cys29-Main Gly32-Main Gly1-Main Ser215-Side 93.89 91.30 84.97 81.82 81.77 67.52 65.71 38.41 20.42 19.83 18.75 9.32 5.70 4.73 94.11 93.85 83.55 78.19 65.95 60.75 52.47 51.03 33.35 31.76 14.91 8.60 5.26 4.09 95.69 94.43 92.87 78.63 70.87 67.13 51.91 34.68 32.82 28.73 13.17 93.69 84.80 83.48 78.67 68.39 60.79 58.24 42.77 40.58 23.07 13.67 13.07 10.21 4.89 2.22 1.09 Lys6-Side Lys6-Side Lys63-Side Ser31-Side Lys6-Side Lys6-Side Asn26-Side Asn26-Side Gly216-Main Tyr151-Side Gln9-Side Ser192-Main Ser192-Side Gln9-Side Ser215-Main Asp191-Side Gly216-Main Thr149-Side Tyr151-Side Ser31-Side Asn26-Side Tyr42-Side 91.88 33.73 32.59 31.83 21.11 20.41 15.19 4.93 2.79 2.09 1.59 Lys6-Side Lys6-Side Lys6-Side Gly216-Main Lys6-Side Ser31-Side Asn26-Side Arg24-Side Cys217-Main Asn26-Side Arg24-Side Ser31-Side Tyr151-Side Ser192-Main Gly216-Main Asp191-Side Ser31-Side Ser192-Side Ser215-Main Thr149-Side Ser146-Main Ser31-Side Tyr151-Side Ser147-Side Gln194-Side Tyr28-Side 90.66 39.38 33.93 28.04 24.24 22.90 20.68 11.02 5.75 4.30 3.50 3.00 1.34 Lys6-Side Asn26-Side Arg24-Side Ser30-Side Lys6-Side Lys6-Side Asn26-Side Lys6-Side Arg24-Side Ser31-Side Ser147-Side Lys6-Side Gly216-Main Lys6-Side Lys6-Side Asn26-Side Ser31-Side Asn26-Side Ser31-Side Thr149-Side Lys6-Side Ser31-Side Gly148-Main Ser192-Main Thr149-Side Ser147-Side Ser147-Side Gly216-Main Asp191-Side Tyr151-Side Ser192-Side Thr149-Side Ser146-Main Ser30-Side Ser192-Main Ser31-Side Ser192-Side Asp191-Side Thr149-Side Gly216-Main Tyr151-Side Ser215-Main Asn26-Side Gly216-Main Ser215-Side Arg24-Side 89.92 75.91 51.95 49.43 48.77 44.07 21.81 15.03 7.20 4.41 1.15 52.66 39.09 29.82 22.34 12.96 6.92 5.75 3.63 2.41 2.07 1.26 1.05 Table S5. Continued TI-20 Gly195-Main Arg6-Main Gln194-Side Gly214-Main Ser215-Side Leu8-Main Cys4-Main Ser197-Main Gln194-Side Gln194-Side Ser215-Side Gly216-Main Arg6-Main Ser212-Main Pro5-Main Cys4-Main Gly2-Main Phe44-Main Gly214-Main Arg6-Main Ile7-Main Cys29-Main Asp34-Main Gly32-Main 90.72 89.79 84.67 68.64 65.75 58.81 44.78 30.41 25.15 17.91 1.84 1.05 Arg6-Side Arg6-Side Gly216-Main Arg6-Side Asn26-Side Arg24-Side Ser31-Side Asn26-Side Thr149-Side Asn26-Side Ser33-Side Arg6-Side Arg24-Side Ser192-Side Gly216-Main Ser31-Side Asp191-Side Thr149-Side Ser146-Main Gly214-Main Tyr151-Side Asn26-Side Ser147-Main Gln219-Side Ser192-Main Ser147-Main 97.80 90.45 72.25 34.59 21.11 14.29 7.87 2.60 2.14 1.92 1.42 1.16 1.12 y6 y4 y5 y3 y2 y1 y11 y12 b12 y13 1260.63 y10 b11 b13 1289.56 y9 b10 1203.60 y8 1176.48 y7 b9 951.33 621.19 y6 b8 b13 b14 857.49 y5 b7 MW = 1419.65 b10 b11 b12 b13 b9 743.45 b6 564.17 470.33 b8 800.47 b7 791.30 b6 701.83 b5 y4 b5 b9 449.14 y3 b4 373.28 b7 396.15 b3 305.09 311.10 218.06 225.08 245.19 b5 b4 Intensity (%) 132.10 b2 362.11 y2 y1 100 b3 630.36 645.28 678.21 b2 1116.57 y7 1059.55 y8 1048.38 y9 CCAM G S G S D G G I CCAM P K I L 972.52 y13 y12 y11 y10 A Prec. 710.832+ * 400 500 600 y7 B 700 y4 y6 800 y3 y2 900 b5 b6 1200 1300 m/z, Da MW = 967.53 b7 y6 499.21 b5 596.26 b4 470.33 429.25 y7 b7 373.28 y3 b6 419.22 y4 b3 362.68 268.14 b4 Intensity (%) 245.19 250.11 b5 298.63 132.10 b2 b4 b3 339.18 y2 y1 100 b2 1100 y1 <Q R A CCAM P K I L b1 1000 b6 b7 837.44 300 724.36 200 701.40 100 * Prec. 484.772+ 300 400 y8 C y7 500 y6 y5 y4 600 y3 y2 m/z, Da MW = 984.36 y6 651.25 y5 y4 b8 900 b8 491.22 407.66 b4 434.20 436.17 326.13 y7 335.14 347.17 y6 y3 b3 b4 y7 y8 b4 * Prec. 493.19 200 300 400 y7 D 2+ 500 y6 y5 y4 600 y3 y2 b5 y7 b4 Intensity (%) 100 200 Fig. S1. Continued 900 m/z, Da 800 900 m/z, Da MW = 1123.60 b8 y5 b5 b8 627.27 b7 b7 y4 234.12 b4 b4 497.26 498.34 401.29 396.20 240.10 245.19 132.10 100 b3 y3 b3 440.71 443.26 467.24 b2 y2 b2 800 y1 <Q Q R A CCAM P R I L y1 700 y6 729.41 100 658.37 Intensity (%) 495.17 248.09 172.07 134.04 y2 b2 191.06 y1 100 b3 800 y1 N G Y CCAM G S R G D b2 700 871.34 200 814.31 100 Prec. 562.802+ * 300 400 500 600 700 y5 y6 y4 y3 y2 y1 y6 600 700 y5 * y3 y4 800 y2 b2 b4 b6 1200 1300 m/z, Da MW = 1030.49 y5 b4 643.30 b5 530.22 502.27 b7 1100 b7 y4 b6 450.70 322.16 b5 402.17 b3 374.12 389.19 229.15 245.08 y3 b2 b5 1000 y7 y6 b6 b7 Intensity (%) 115.08 y2 132.10 y2 y1 b3 900 y1 E D E R I CCAM P L 100 844.45 731.37 Prec. 685.362+ 900.39 902.44 y7 639.31 b9 500 F b9 787.41 400 300 b7 803.33 200 y6 b6 658.37 100 MW = 1368.70 b8 b9 y5 b5 565.30 571.34 b4 b8 b7 583.29 474.28 500.78 y3 y7 b3 b5 b6 799.34 b4 b5 y8 y4 526.23 526.75 y2 370.12 205.10 y4 b3 Intensity (%) 241.08 b2 318.18 320.16 y1 100 286.17 b2 1165.58 y7 E E E R I CCAM P R I W 896.39 y8 E Prec. 516.252+ * y10 y9 y8 600 y7 y4 700 y3 100 200 300 400 500 600 700 b6 y7 y8 1053.44 Prec. 602.633+ y8 * b5 MW = 1804.88 b10 y11 753.44 y4 b9 y10 768.85 b6 688.84 b4 b5 640.36 b4 527.28 418.24 b3 430.22 b6 377.23 b5 320.68 b4 b2 Intensity (%) 215.62 b3 264.14 270.19 290.15 100 b3 578.27 583.77 b2 y3 m/z, Da 900 y2 R I CCAM P L I W2OX Q E CCAM K R D y2 800 800 900 1000 y9 1100 b9 1200 1300 y10 b10 1376.66 y11 G 500 1388.65 400 1279.61 300 1228.61 200 1166.53 100 m/z, Da Fig. S1. Representative tandem MS spectra of diagnostic peptides corresponding to Momordica cyclic peptides and their acyclic counterparts. Shown above are tandem MS spectra of diagnostic peptides corresponding to TI18 from M. macrophylla (A), TIPRE peptides from M. anigosantha (B, C), TI-19 from M. macrophylla (D), TI28 from M. charantia (E), EI-1 from M. charantia (F), and an elastase inhibitor from M. foetida (G). Precursor mass is shown with an asterisk. Series of b- and y-ions were used to deduce the sequence. Green and red indicates a 2+ charge state of the b- and y-ion, respectively. Modifications of amino acid were observed, i.e. pyrolated (<), carbamidomethylated (CAM), and dioxidized (2OX). (For interpretation of the references to color, please refer to the web version of this article.) Relative Abundance (%) Unfractionated TI-1 TI-2 TI-4 TI-5 TI-6 TI-7 TI-8 Intensity (cps) 4.0e5 Fraction 1: 10–20% Solvent B Relative Abundance (%) Fraction 2: 20–30% Solvent B Relative Abundance (%) Fraction 3: 30–40% Solvent B Relative Abundance (%) Fraction 4: 40–50% Solvent B Relative Abundance (%) TI-1 TI-2 TI-4 TI-5 TI-6 TI-7 TI-8 Intensity (cps) 3.9e5 Intensity (cps) 2.2e5 TI-1 TI-2 TI-4 TI-5 TI-6 TI-7 TI-8 Intensity (cps) 3.4e5 5 10 93.6 ± 1.7 79.8 ± 2.1 93.4 ± 5.5 97.2 ± 3.6 97.1 ± 2.0 92.4 ± 3.4 75.2 ± 2.1 4.2 ± 0.6 1.2 ± 0.0 0.1 ± 0.0 BT 0.1 ± 0.0 0.1 ± 0.0 18.0 ± 1.7 BT 0.2 ± 0.0 BT BT BT BT 4.9 ± 1.0 TI-1 TI-2 TI-4 TI-5 TI-6 TI-7 TI-8 Intensity (cps) 1.4e5 93.9 ± 3.0 79.7 ± 2.5 90.0 ± 2.0 94.0 ± 3.6 96.0 ± 3.3 91.1 ± 3.2 75.4 ± 1.8 TI-1 TI-2 TI-4 TI-5 TI-6 TI-7 TI-8 15 20 25 30 35 40 45 50 2.2 ± 0.3 18.8 ± 0.8 6.5 ± 0.3 2.8 ± 0.1 2.8 ± 0.2 7.5 ± 0.3 1.9 ± 0.2 55 60 min 70 Fig. S2. Elution profiles of unfractionated and fractionated M. cochinchinensis seed extracts. The majority of the targeted peptides (coloured dots; the small dots represent the ß-aspartyl isomers) are present in fraction 10–20 solvent B. The relative abundances of the targeted peptides as inferred from their peak intensities are given in the right hand box. BT: Below threshold (10 cps). For details on the calculation see Mahatmanto et al. (2014). (For interpretation of the references to color, please refer to the web version of this article.) y3 y2 y1 b4 b7 b5 b6 y7 y4 y5 b4 584.28 475.76 b6 MW = 1080.59 b7 485.79 498.34 y6 y3 b5 401.29 658.37 b4 292.64 212.62 245.19 y5 b2 268.14 132.10 y2 b3 681.32 y1 100 b3 407.74 419.22 b2 y6 b5 814.47 y4 681.32 y5 <E R R CCAM P R I L 658.37 y6 y7 A 424.24 Intensity (%) b3 Prec. 541.302+ * y8 B 500 y7 y6 600 y3 y2 b5 b6 b7 MW = 1275.59 y6 b5 b6 796.46 430.22 b8 683.38 398.73 b4 527.28 342.19 b7 565.29 b3 481.18 b6 491.77 b5 m/z, Da 900 b8 y3 295.10 b2 270.19 241.09 215.62 b4 b4 Intensity (%) 148.06 y2 y3 b3 264.14 y1 100 b3 800 y1 R I CCAM P R I W MOXI E b2 700 y7 b7 y8 1120.53 400 1007.44 300 982.54 200 847.41 100 Prec. 638.802+ * 200 300 400 500 y7 C 600 y6 y5 y4 y3 700 y2 b6 1100 m/z, Da MW = 981.51 y5 b5 658.37 b4 1000 b7 582.25 498.34 369.68 b7 b5 y4 485.19 325.16 Intensity (%) b4 y7 426.22 b6 401.29 b3 245.19 249.67 y4 b 2 268.14 132.10 100 b3 y3 436.25 b2 y2 900 y1 <E R G CCAM P R I L y1 800 y6 715.39 100 Prec. 491.762+ * 100 200 300 400 500 600 700 800 900 m/z, Da Fig. S3. Representative tandem MS spectra of diagnostic peptides present in M. charantia seed extract fraction 20–30 solvent B. Shown above are tandem MS spectra of diagnostic peptides corresponding to MCTI-I (A), MCTI-II (B), and MCTI-III (C). Precursor mass is shown with an asterisk. Series of b- and y-ions were used to deduce the sequence. Green and red indicates a 2+ charge state of the b- and y-ion, respectively. Modifications of amino acid were observed, i.e. pyrolated (<), carbamidomethylated (CAM), and oxidized (OXI). (For interpretation of the references to color, please refer to the web version of this article.)

RELATED PAPERS

RELATED TOPICS

Log In

The evolution of momordica cyclic peptides

The evolution of momordica cyclic peptides

Related Papers

RELATED PAPERS

RELATED TOPICS