The Evolution of Momordica Cyclic Peptides
Pre-submission version. For final version see:
Mol Biol Evol, 2014 Nov 6. pii: msu307. [Epub ahead of print]
PMID: 25376175
Article (Discoveries)
The Evolution of Momordica Cyclic Peptides
Tunjung Mahatmanto,1 Joshua S. Mylne,2 Aaron G. Poth,1 Joakim Swedberg,1 Quentin Kaas,1
Hanno Schaefer3, and David J. Craik*,1
1
Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland
4072, Australia
2
The University of Western Australia, School of Chemistry and Biochemistry & The ARC
Centre of Excellence in Plant Energy Biology, 35 Stirling Highway, Crawley, Perth 6009,
Australia
3
Plant Biodiversity Research, Technische Universität München, Emil-Ramann Strasse 2,
85354 Freising, Germany
*Corresponding author: David J. Craik
Address: Institute for Molecular Bioscience, 306 Carmody Road, Building 80, The University
of Queensland, Brisbane, Queensland 4072, Australia
Telephone: +61 7 3346 2019
Fax: +61 7 3346 2101
E-mail: d.craik@imb.uq.edu.au
1
Abstract
Cyclic proteins have evolved for millions of years across kingdoms of life to confer structural
stability over their acyclic counterparts while maintaining intrinsic functional properties. Here
we show that cyclic mini-proteins (or peptides) from Momordica (Cucurbitaceae) seeds
evolved in species that diverged from an African ancestor around 19 million years ago. The
ability to achieve head-to-tail cyclization of Momordica cyclic peptides appears to have been
acquired via a series of mutations in their acyclic precursor coding sequences following recent
and independent gene expansion event(s). Evolutionary analysis of Momordica cyclic
peptides reveals sites that are under selection, highlighting residues that are presumably
constrained for maintaining their function as potent trypsin inhibitors. Molecular dynamics of
Momordica cyclic peptides in complex with trypsin reveals site-specific residues involved in
target binding. In a broader context, this study provides a basis for selecting Momordica
species to further investigate the biosynthesis of the cyclic peptides and for constructing
libraries that may be screened against evolutionarily related serine proteases implicated in
human diseases.
Key words: Momordica seeds, cyclic peptides, serine protease inhibitors, evolution
2
Introduction
Head-to-tail or backbone cyclization confers peptides with resistance to proteolysis and
peptides bearing this trait presumably evolved from ancestral acyclic peptides (Trabi and
Craik 2002). Backbone cyclized peptides found in angiosperms are here categorized into three
groups based on their structures (Fig. 1). Group 1 consists of cyclic peptides with three
disulfide bonds that form a knotted core, i.e. the cyclotides (Craik et al. 1999). Group 2
consists of cyclic peptides with one disulfide bond, which includes SFTI-1 or sunflower
trypsin inhibitor-1 (Luckett et al. 1999), SFT-L1 or SFTI-Like 1 (Mylne et al. 2011), and
PDPs or PawS-derived peptides (Elliott et al. 2014). Members of both groups belong to the
subclass homopolycyclopeptides type VIII of the plant cyclopeptides (Tan and Zhou 2006).
Group 3 consists of cyclic peptides with no disulfide bonds referred to as orbitides (Arnison et
al. 2013). Members of this group belong to the subclass homomonocyclopeptides type VI of
the plant cyclopeptides (Tan and Zhou 2006).
All of the aforementioned cyclic peptides are gene-encoded and occur in distant families of
angiosperms. Group 1 have been found in Rubiaceae, Violaceae, Cucurbitaceae, Fabaceae,
and Solanaceae (Craik 2013); Group 2 in Asteraceae (Elliott et al. 2014); and Group 3 in
Annonaceae, Caryophyllaceae, Euphorbiaceae, Lamiaceae, Linaceae, Phytolaccaceae,
Rutaceae, Schizandraceae, and Verbenaceae (Arnison et al. 2013). Despite being
phylogenetically distant, the biosynthesis of the first two groups appears to have evolved in
parallel, being channelled through transpeptidation by asparaginyl endopeptidase (AEP) that
joins their ends (Saska et al. 2007; Gillon et al. 2008; Mylne et al. 2011; Mylne et al. 2012).
This AEP-mediated cyclization requires a conserved proto-N-terminal Gly, proto-C-terminal
Asx (i.e. Asn or Asp), small residue at P1′, and Xle (i.e. Leu or Ile) at P2′ (Mylne et al. 2011;
3
Mylne et al. 2012). On the other hand, cyclization of the third group is mediated by peptide
cyclase (PCY1), a serine protease-like enzyme (Barber et al. 2013).
The seeds of Momordica cochinchinensis (Cucurbitaceae) contain cyclic peptides that belong
to the cyclotide group. MCoTI-I and -II (Momordica cochinchinensis trypsin inhibitor-I and II) are the first members of Momordica cyclic peptides to be discovered (Hernandez et al.
2000) and have been studied extensively, particularly for applications in the biomedical field.
Interest in these peptides stems from the proteolytic stability conferred by their structural
motif (Colgrave and Craik 2004) and the amenability of the residues comprising their
intracysteine loops to mutation (Austin et al. 2009). Thus, in principle Momordica cyclic
peptides can be used as highly stable grafting scaffolds. As both peptides have the ability to
enter cells (Greenwood et al. 2007; Cascales et al. 2011; Contreras et al. 2011; D’Souza et al.
2014), they have been touted as potential vectors for the delivery of grafted epitopes with
desired activities to intracellular targets (Ji et al. 2013). Examples of grafting applications
include engineering of: (i) MCoTI-I into an anti-HIV agent (Aboye et al. 2012) and an
antagonist of intracellular proteins Hdm2 and HdmX for suppressing tumor growth (Ji et al.
2013) and (ii) MCoTI-II into a β-tryptase inhibitor (Thongyoo et al. 2009; Sommerhoff et al.
2010) and human leukocyte elastase inhibitor (Thongyoo et al. 2009) for inflammatory
disorders, and a pro-angiogenic agent for wound healing (Chan et al. 2011). Despite these
remarkable successes, the introduction of new activities onto the Momordica cyclic peptide
scaffold remains challenging because the limiting structural and functional constraints are not
yet fully understood. During the course of evolution, negative selection purges mutations that
have deleterious effects to the structure of peptides, constraining their ability to acquire new
function, which would otherwise be fixed under positive selection (Tokuriki and Tawfik
4
2009). Thus, knowledge of residues under selection should provide insights into the
limitations of Momordica cyclic peptides to be engineered as scaffolds.
To understand the evolution of Momordica cyclic peptides, it is imperative that their
distribution and diversity be traced and mapped. Momordica is a clade of c. 60 tropical and
subtropical climbers and creepers that diverged from a common ancestor around 35 million
years ago (Schaefer and Renner 2010) and underwent long-distance dispersal across Africa,
Asia, and Australia (Schaefer et al. 2009). Being a crucial agent for plant dispersal, seeds play
a key role in the speciation of plants, carrying within them genetic information for the
establishment of new plants under spatio-temporally disparate environmental pressures.
Tracing the distribution of cyclic peptides and their acyclic counterparts in Momordica
species will allow us to determine when, where, and how the genes for their biosynthesis
evolved. In turn, this knowledge may serve as a basis for selecting Momordica species to
further investigate how the cyclic peptides arise, e.g. through comparative transcriptome
analysis to identify gene sequences and enzymes essential for their processing. Furthermore,
mapping the diversity of cyclic peptides in Momordica seeds will allow us to identify sitespecific residues that are evolving under selection, i.e. negative selection to maintain the
existing structure whereas positive selection to adopt new function. This information may be
particularly useful in the context of designing inhibitors of evolutionarily related serine
proteases implicated in human diseases using the Momordica cyclic peptide scaffold.
In this study we describe the distribution and diversity of cyclic peptides and their acyclic
counterparts in the seeds of 24 Momordica species and an outgroup species Siraitia
grosvenorii (Fig. 2). We discover new TIPTOP (Two Inhibitor Peptide TOPologies) genes,
which encode multiple cyclic peptide domains and terminate with an acyclic peptide domain
5
(Mylne et al. 2012), and partial gene sequences, which we refer to TIPRE (Tandem Inhibitor
Peptide Repeats), that encode multiple acyclic peptide domains. We assemble transcripts that
encode a single acyclic peptide domain like the previously reported TGTI-II (Towel Gourd
Trypsin Inhibitor-II) cDNA (Ling et al. 1993; Mylne et al. 2012). In addition, we identify
diagnostic peptides that correspond to the cyclic peptides and their acyclic counterparts.
Despite having undergone long-distance dispersal events during the speciation of Momordica
(Schaefer et al. 2009; Schaefer and Renner 2010), we found the sequence diversity of
Momordica cyclic peptides to be low compared to other members of the cyclotide group
(Kaas and Craik 2010). This conservation could be explained by the recentness of the event(s)
that created the cyclic peptides or by the selection operating on the cyclic peptide domain
repeats for maintaining their structure, and thus their documented function as potent trypsin
inhibitors (Avrutina et al. 2005). As hydrogen bonding networks are known to play an
important role in protein recognition (Lu et al. 1997), constraining mutations that could
compromise the functional fold conferred by the hydrogen bond networks is vital. Molecular
dynamics of Momordica cyclic peptides in complex with trypsin reveal alterations in the
intermolecular hydrogen bond network upon single amino acid substitutions, highlighting
sites that have the potential to be engineered for selective target binding.
6
Results
In this study we traced the occurrence of cyclic peptides in the seeds of 24 Momordica species
and an outgroup species Siraitia grosvenorii, mapped the residues under selection, and
examined the effect of single amino acid substitutions to the intermolecular hydrogen bond
network of selected naturally occurring Momordica cyclic peptides in complex with trypsin.
Precursors of Momordica Cyclic and Acyclic Peptides
PCR of Momordica genomic DNA using the primers that amplified TIPTOP genes from M.
cochinchinensis and M. sphaeroidea (Mylne et al. 2012) resulted in new TIPTOP genes from
two Asian Momordica, i.e. one from M. subangulata (TIPTOP4) and two from M.
macrophylla (TIPTOP5 and TIPTOP6). TIPTOP4–6 respectively encode six, four, and five
cyclic peptides, each terminating with an acyclic peptide (the list of the encoded peptides is
given in Supplementary Table S1 and a representation of the precursors is given in Fig. 3A).
Two of the encoded cyclic peptides, i.e. MCoTI-II and MCoTI-IV (hereafter we remove the
MCo prefix because some of the peptides are also present in other Momordica species and use
an Arabic numeral for simplicity, e.g. MCoTI-II becomes TI-2), have been reported
(Hernandez et al. 2000; Mylne et al. 2012) whereas the others are new but have similar
sequences that share the Asp-Gly cyclization point. Similarly, two of the encoded acyclic
peptides, i.e. TI-5 and TI-6, have been reported (Mylne et al. 2012). The other acyclic peptide,
i.e. TI-19, differs from TI-5 in that it has an additional N-terminal Gln. An alignment of
Momordica cyclic peptides and their acyclic counterparts is given in Fig. 3B.
A new set of primers for conserved sequences within the endoplasmic reticulum (ER) signal
and the acyclic peptide domain was designed because the first primer set could not amplify
TIPTOP genes from the remaining Momordica genomic DNA. PCR with this new set of
7
primers resulted in five partial gene sequences that appear to have undergone expansion
similar to TIPTOP and thus were named TIPRE, for Tandem Inhibitor Peptide REpeats. Four
of the partial gene sequences were found in the African M. anigosantha (TIPRE1–4) and one
was found in the African M. friesiorum (TIPRE5). The list of the encoded peptides is given in
Supplementary Table S1. The TIPRE peptides, i.e. TI-24–27, have similar sequences to the
TIPTOP acyclic peptides but with additional four C-terminal residues like the TIPTOP cyclic
peptides (Fig. 3B). This finding suggests that the acyclic peptides acquired features for
cyclization following extension of their tail, which provided the target residues for AEP to
then perform transpeptidation that ligates their C-terminus to their N-terminus.
Analysis of a previously reported African M. charantia seed transcriptome (Yang et al. 2010)
revealed five transcripts (the translation of the transcripts is given in Supplementary Table
S2), each encoding a single acyclic peptide domain. Two of the transcripts encode the acyclic
peptides MCTI-I (Momordica charantia Trypsin Inhibitor-I) and MCTI-III (Hara et al. 1989;
Hamato et al. 1995). One transcript encodes an acyclic peptide, which we refer to as TI-28,
that is similar to MCTI-II (Hara et al. 1989) but with an extended N-terminus. Another
transcript encodes an acyclic peptide, which we refer to as EI-1 (Elastase Inhibitor-1), that is
similar to MCEI-IV (Momordica charantia Elastase Inhibitor-IV (Hamato et al. 1995)) but
differs in one residue following the N-terminal Glu. The absence of a dedicated precursor for
MCTI-II and MCEI-I–III (Supplementary Table S2), which respectively are shorter than TI28 and EI-1 in their N-terminus (Fig. 3B), suggests that they are products of post-translational
N-terminal trimming, a process that has been proposed to give rise to the acyclic peptides
hedyotide B2–4 from the Rubiaceae Hedyotis biflora (Nguyen et al. 2011). On the other hand,
the other transcript potentially encodes a new peptide, which we refer to as TI-23, as judged
by the sequence similarity to the other Momordica cyclic peptides. Given the lack of features
8
for AEP processing in its precursor (Supplementary Table S2), it would be interesting to
confirm the presence of TI-23.
Identification of Peptides using a Targeted Proteomics Approach
A targeted search for Momordica cyclic peptides and their acyclic counterparts was aided by
the observation of tandem MS for diagnostic peptides, which results from the digestion of
reduced and alkylated peptides. For cyclic peptides, the diagnostic peptides are chymotrypsin
digests harbouring sequence tags that extend over their cyclization points, i.e. Cys29 to Leu8
(for residue numbering please refer to Fig. 3B). For acyclic peptides, the diagnostic peptides
result from trypsin, chymotrypsin, or endoproteinase Glu-C digestion. A list of the sequences
of representative diagnostic peptides found is given in Supplementary Table S3. The
distribution of Momordica cyclic peptides and their acyclic counterparts is presented in Fig. 4.
Tandem MS evidence for cyclic peptides was only found in the Asian M. cochinchinensis, M.
macrophylla M. denticulata, M. subangulata, and M. clarkeana and in a close relative African
M. gilgiana (a representative tandem MS spectrum is given in Supplementary Fig. S1A). No
evidence was found for TI-23 in the African M. charantia. For the African M. anigosantha
TIPRE peptides, evidence was only found to support acyclic peptides, as would be expected
judging from the lack of a proto-N-terminal Gly. Observed tandem MS spectra were
consistent with a chymotrypsin and an endoproteinase Glu-C digest product that correspond
to acyclic peptides having an N-terminal pyrolated glutamine (Supplementary Fig. S1B) and a
four C-terminal residue extension (Supplementary Fig. S1C). Evidence for the acyclic trypsin
inhibitors was found in all of the species analysed (Supplementary Table S3, representative
tandem MS spectra are given in Supplementary Figs. S1D and S1E). For the acyclic elastase
inhibitors, evidence was only found in the African M. leiocarpa, M. foetida, M. balsamina,
9
and M. charantia (representative tandem MS spectra are given in Supplementary Figs. S1F
and S1G).
Mapping of Sites Under Selection
To map the sites that are evolving under selection, we analyzed the number of synonymous
substitution per site (dS) and the number of nonsynonymous substitution per site (dN) of the
cyclic peptides. The value of the dN-dS indicates whether a particular site is evolving under
negative (if the value is negative) or positive (if the value is positive) selection. The dN-dS
analysis of the cyclic peptides (Supplementary Table S4) revealed that two cysteines and 11
intracysteine residues are evolving under negative selection, which include four of five
residues in Loop 2, one of three residues in Loop 3, one of five residues in Loop 5, and five of
eight residues in Loop 6. On the other hand, eight intracysteine residues are evolving under
positive selection, which include three of six residues in Loop 1, one of three residues in Loop
3, one of one residue in Loop 4, two of five residues in Loop 5, and one of eight residues in
Loop 6. The remaining 13 residues (four cysteines, three residues in Loop 1, one residue in
Loop 2, one residue in Loop 3, two residues in Loop 5, and two residues in Loop 6) are
neutral. This finding highlights the types of selection operating on the sites of the Momordica
cyclic peptide scaffold (Fig. 5A).
Dynamics of the Hydrogen Bond Network of Selected Cyclic Peptides with Trypsin
Mutations provide an essential raw material for evolution and serve as a basis for acquiring
new functions (Tokuriki and Tawfik 2009). As hydrogen bonds play a role in the recognition
of inhibitors against their target proteins (Lu et al. 1997), examining the effect of mutations
on the interaction of inhibitors against their target proteins is fundamental to understanding
the mechanistic of their biological function. Here we examined the effect of single amino acid
10
substitutions to the intermolecular hydrogen bonds of the Momordica cyclic peptides TI-1 and
TI-2 in complex with trypsin using molecular dynamics. The mutations made to TI-1 were
based on the sequence of TI-8, TI-18 and TI-21 whereas those in TI-2 were based on the
sequence of TI-10, TI-20, and TI-22.
Analysis of the dynamics of the cyclic peptides in complex with trypsin reveals site-specific
residues that play a role in forming the intermolecular hydrogen bond network (Fig. 5B; list
of donors, acceptors, and frequency of occupancy of the hydrogen bonds is given in
Supplementary Table S5). One of the prominent features of Momordica cyclic peptides is that
their main chains, i.e. of Loops 1 and 6, form the majority of the hydrogen bond network with
trypsin. For TI-1, the residues involved in main chain hydrogen bonding with trypsin are
Gly1,2,32, Cys4,29, Pro5, Lys6, Ile7, Leu8, and Asp34 whereas the residues involved in side chain
hydrogen bonding are Lys6, Gln9, Arg24, Asn26, and Ser31. For TI-2, the residues involved in
main chain and side chain hydrogen bonding with trypsin are similar to TI-1, with the
exception of Lys10 instead of Gly32 for the main chain and Tyr28 instead of Gln9 for the side
chain. Single amino acid substitutions in both cyclic peptides altered their hydrogen bond
network with trypsin, notably are the introduction or abolishment of and the increase or
decrease in frequency of occupancy of a number of the main chain and side chain hydrogen
bonds (Fig. 5B). This result highlights the dynamics of the hydrogen bond network of
Momordica cyclic peptides in complex with trypsin, providing insights into the potential role
of site-specific residues for target binding.
11
Discussion
Momordica Cyclic Peptides Occur in Species That Diverged from an African Ancestor
Around 19 Million Years Ago
TIPTOP genes were found following the report of the cyclic peptides MCoTI-I and MCoTI-II
from M. cochinchinensis seeds (Hernandez et al. 2000). The unusual nature of how the genes
are organized, i.e. having multiple repeats of cyclic peptide domains that terminate with an
acyclic peptide domain, led to the hypothesis that they might have expanded from an ancestral
gene via internal duplication event(s) (Mylne et al. 2012). Tracing the distribution of TIPTOP
cyclic peptides may provide insights into when and where they emerged.
A targeted search revealed the occurrence of cyclic peptides in the Asian M. cochinchinensis,
M. macrophylla, M. denticulata, M. subangulata, and M. clarkeana and in a close relative
African M. gilgiana (Fig. 4). The lack of evidence for cyclic peptides in other representative
African taxa suggests that the cyclic peptides have arisen within species that descended from
a common ancestor to the Asian and a close relative African species. The Asian Momordica
are a result of a long-distance dispersal of an African ancestor that came back to Asia around
19 million years ago – an event that marks the divergence of the Asian species from their
close relative African M. gilgiana (Schaefer and Renner 2010). Thus, the ancestral gene of
TIPTOP presumably has been inherited from this African ancestor. Interestingly, the
expansion that created TIPTOP genes appears to have occurred recently and independently, as
suggested by the highly conserved signal peptides (Fig. 6A), which are known to evolve
rapidly (Li et al. 2009), and the distinct number of domain repeats and sequence of the
encoded peptides. This scenario would require specific selective pressures operating on both
the Asian species and the close relative African M. gilgiana.
12
Plausible Selective Advantage and Pressure Underlying the Expansion that Created
TIPTOP Genes
The selective advantage conferred by the expansion that created TIPTOP genes might be
related to tight regulatory control for expression of the encoded peptides. The repetitive nature
of TIPTOP genes would allow the expression of multiple peptides from one transcript. On the
other hand, the expansion would alter the RNA secondary structure, which is known to be one
of the key determinants to post-transcriptional regulation in plants (Silverman et al. 2013). A
recent study using the model plant Arabidopsis thaliana showed that LOW MOLECULAR
WEIGHT CYSTEINE-RICH-encoding mRNA is amongst the highly structured mRNA that
tends to be degraded more frequently than less structured mRNA (Li et al. 2012). Calculation
of the folding energy of TIPTOP transcripts using Mfold (Zuker 2003) suggests that the
expansion decreases the free energy for folding of TIPTOP transcripts into their secondary
structures (Fig. 6B). Taken together, the expansion might have allowed the encoded peptides
to be produced efficiently but not excessively – a trait that fits well with a defense response
and storage function.
Resistance to invaders is amongst the most ancient traits that evolved through discriminating
self from non-self (Staal and Dixelius 2007). Having a biological activity as potent inhibitors
of trypsin (Avrutina et al. 2005), one of the main digestive enzymes of invaders, TIPTOP
peptides may be regarded as anti-nutritive agents. Given that many of the known seed-derived
inhibitors are only active against digestive enzymes of insects but not against endogenous
enzymes (Shewry and Casey 1999), it is tempting to speculate that the expansion that created
TIPTOP genes might have been triggered by predatory cues. On the other hand, the high
cysteine content of TIPTOP peptides suggests that they might also serve a dual function for
storage purposes, providing sulphur along with nitrogen and carbon for germination and
13
seedling growth (Shewry and Casey 1999). The exceptional stability of the cyclic cystine knot
class of peptides (Colgrave and Craik 2004) might be related to their function as long-term
storage proteins, supporting extended periods of dormancy as most Cucurbitaceae have
orthodox seeds – they can tolerate considerable desiccation and thus have greater longevity
compared to recalcitrant seeds (Ellis 1991). Further studies would be needed to test this dual
function hypothesis.
Mutations in the Acyclic Peptide Precursors: The Link to Cyclization
The emergence of backbone cyclized peptides of Groups 1 and 2 (Fig. 1) have been shown to
be mediated by AEP (Saska et al. 2007; Gillon et al. 2008; Mylne et al. 2011; Mylne et al.
2012), a vacuolar processing enzyme (VPE) that cleaves Asn and Asp (Hara-Nishimura et al.
1991; Hiraiwa et al. 1999), which respectively precedes and ends the TIPTOP cyclic peptide
domains (Fig. 6C). The residues trailing the proto-C-terminal are usually a small residue at
P1′ and Xle at P2′. Cyclization of the peptide backbone occurs through a transpeptidation that
critically requires the presence of a proto-N-terminal Gly, which is thought to lack the steric
hindrance of other side chains thus enabling transpeptidation to occur (Mylne et al. 2011).
The absence of these features in the precursors of the acyclic peptides means that
transpeptidation by AEP will not occur, thus AEP acts as the constraining evolutionary
channel for cyclization (Mylne et al. 2012).
Alignment of the leader and mature peptide repeats of TIPTOP4 from the Asian M.
subangulata and TIPRE4 from the African M. anigosantha (Fig. 6C) suggests that a series of
mutations following internal gene duplication provided the features for AEP to perform
transpeptidation. We hypothesize that the acyclic peptides first acquired an extension of their
C-terminal (by insertion of the SXXD segment) followed by deletion of their N-terminal
14
region (GVYXXXQR segment) and mutation of their P1′ proto-C-terminal trailing residue
(Met to a small residue, in this case Ala), which would then lead to the predisposition of the
precursors to cyclization by AEP. With an N-terminal Gln like the TIPTOP acyclic peptides
and an extended C-terminus like the TIPTOP cyclic peptides, the TIPRE peptides may be
regarded as “intermediates” in the evolution of TIPTOP cyclic peptides from their acyclic
counterparts. This hypothesis is consistent with the phylogenetic analysis that places the
African M. anigosantha close to the species in which TIPTOP cyclic peptides occur (Fig. 4;
Schaefer and Renner 2010).
The absence of evidence for the putative cyclic peptide TI-23 in the African M. charantia
might be due to the lack of common features for AEP processing in its precursor, which only
harbors a single peptide domain and does not have residues trailing the proto-C-terminal Asn
and Asx preceding the proto-N-terminal Gly (Supplementary Table S2). Thus, internal gene
duplication may be considered as a steppingstone to the acquisition of features for
transpeptidation by AEP, which the TIPTOP genes then acquired via a series of mutations
during their course of evolution. Molecular phylogenetic analysis reveals that the transcript
encoding TI-23 is distantly related to the other coding sequences (Fig. 6D), suggesting that it
might be an ancestral vestige of a kindred evolutionary process that, in the Asian and a close
relative African species, created the cyclic peptides.
Neofunctionalization of an Acyclic Peptide in the African Momordica Species
Gene duplication has been considered to be the main source of material for the emergence of
new functions. The rate at which a gene duplication event occurs is considered high, with the
duplicates being silenced within a few million years and the survivors being selected under
strong negative pressure (Lynch and Conery 2000). The high sequence similarity of the
15
transcripts encoding MCTI-I and MCTI-III and the transcripts encoding TI-28 and EI-1
(Supplementary Table S2), suggests that they have arisen via gene duplication. In the case of
the transcripts encoding TI-28 and EI-1, the acyclic peptides have different biological
activities, i.e. the former a trypsin inhibitor whereas the latter an elastase inhibitor. Because
both of the acyclic peptides are expressed within the seeds of the African M. charantia, the
duplicated gene may be considered to have undergone neofunctionalization, i.e. it acquired a
new function that is preserved by natural selection.
Sequence analysis reveals that the new function emerged from a one point mutation at the
second codon of the P1 site – the site that interacts with the S1 pocket (or active site) of the
target enzyme. The G of the CGA that encodes Arg, which is the preferred P1 residue for
trypsin (Krieger et al. 1974), is replaced by T and thus changing it into Leu (Fig. 6E), which
is preferred for elastase (Hara et al. 1989), if assuming that elastase inhibitor is the new
function. Interestingly, the acyclic elastase inhibitors were only found in the African M.
charantia and three closely related African species i.e. M. balsamina, M. leiocarpa, and M.
foetida, which diverged around 21 million years ago (Schaefer and Renner 2010). This
finding suggests that neofunctionalization is a rare evolutionary fate for this class of peptides.
The Role of Site-Specific Residues of Momordica Cyclic Peptides
A range of studies have shown remarkable success in the use of Momordica cyclic peptides as
scaffolds for developing novel therapeutics (Poth et al. 2013). To better exploit Momordica
cyclic peptides for biomedical applications, it is imperative that their evolvability, i.e. the
ability to acquire new function through structural changes (Tokuriki and Tawfik 2009), be
understood. Given the function of Momordica cyclic peptides as potent trypsin inhibitors,
knowledge of their evolvability is particularly useful for developing novel inhibitors of other
16
evolutionarily related serine proteases – many of which play crucial roles in
pathophysiological processes, such as inflammation and blood clotting (Bachovchin and
Cravatt 2012).
One of the approaches that can be used to understand the evolvability of Momordica cyclic
peptides is by mapping their sequence diversity, which has been shaped by natural selection.
Evolutionary analysis reveals the type of selection operating on the sites of Momordica cyclic
peptides. As shown in Fig. 5A, the majority of the sites are either neutral (no substitution) or
under negative selection (substitutions were synonymous). Residues that occupy these sites
are presumably preserved for maintaining the cyclic cystine knot structure. Indeed, four of the
six cysteines that form the cystine knot core are neutral and two are under negative selection.
The N-terminal Gly and C-terminal Asp that have been shown to be vital for cyclization by
AEP (Mylne et al. 2011) are also under negative selection. The significance of preserving
site-specific residues in Loop 6 might be related to the effect that cyclization has on the
folding pathway of the peptides, facilitating the formation of the correct cysteine
connectivities, thus presumably reducing the entropic losses upon folding compared to their
acyclic counterparts (Daly et al. 1999).
Residues that occupy sites under positive selection are presumably paving the way to
adopting a new function. The P1 site that defines the selectivity of Momordica cyclic peptides
is evolving under positive selection (Fig. 5A) but the amino acid change has not introduced a
new function, i.e. Lys and Arg are both preferred in the S1 pocket of trypsin (Krieger et al.
1974). The neutral selection operating on P1′ site, which is occupied by Ile – a hydrophobic
residue thus preferred by trypsin (Kurth et al. 1997), suggests that Momordica cyclic peptides
are co-evolving with trypsin, which has strictly conserved residues associated with its
17
specificity in both prokaryotes and eukaryotes (Rypniewski et al. 1994). Because the three
dimensional structure of serine proteases is highly conserved and their active sites are
virtually the same, i.e. having the catalytic triad composed of Asp–His–Ser (Higaki et al.
1987), molecular dynamics of Momordica cyclic peptides with trypsin may serve as a model
for studying the role of site-specific residues for target binding, particularly that imparted by
hydrogen bonding.
As shown in Fig. 5B, the hydrogen bond network at the interface of Momordica cyclic
peptides with trypsin is primarily formed by the main chain of the cyclic peptides, with P1,
P2, P3, P5 and P2′ sites having a high frequency of occupancy of hydrogen bond. Although
this hydrogen bond network is common in serine protease–inhibitor complexes where the
inhibitory loops are locked in an extended antiparallel β-sheet conformation (Hedstrom 2002),
this canonical conformation is not adopted by the inhibitory loop of the Momordica cyclic
peptide scaffold (Daly et al. 2013). The occupancy of Pro at P2 and Cys at P3, which do not
contribute to side chain interactions with trypsin, suggests that the selectivity of Momordica
cyclic peptides is mediated by other sites, e.g. their prime sites. This characteristic is unique
because S2 and S3 are known to determine the specificity of serine proteases, e.g. a
hydrophobic residue at P2 is preferred by chymotrypsin (Brady and Abeles 1990) whereas at
P3 by elastase (Thompson and Blout 1973). The P3′ site, which is under positive selection,
might be important for target binding. Unlike Lys, the occupancy of Gln at P3′ introduces a
side chain hydrogen bond and thus may explain the higher trypsin inhibitory activity of TI-1
compared to TI-2 (Avrutina et al. 2005).
As the basis of evolution, mutations play a major role in forming the hydrogen bond network
of protease–inhibitor interaction sites. Mutations that introduce new hydrogen bonds at these
18
sites or increase the frequency of occupancy of existing ones are highly desirable. However,
mutations can also lead to the contrary and thus examining the effect of mutations to the
hydrogen bond network at these interfaces is important. As shown in Fig. 5B, single amino
acid substitutions altered the intermolecular main chain and side chain hydrogen bond
network of Momordica cyclic peptides with trypsin. These alterations may serve as a basis for
the identification of sites that have target binding potential, i.e. sites 9 in Loop 1; 24, 26, 28 in
Loop 5; and 30, 31, 33 in Loop 6. This prediction agrees with a study that showed that the
aforementioned sites in Loops 1 and 5 play a role in binding with trypsin (Austin et al. 2009).
In the context of drug design, this knowledge may translate into the design of site-specific
libraries using the Momordica cyclic peptide scaffold. This design approach was successful
for engineering kalata B1, the prototypic cyclotide found in Rubiaceae and Violaceae, into an
antagonist of neuropilin-1 and -2, which are known to be regulators of vascular and lymphatic
development (Getz et al. 2013).
In summary, this study presents evidence that suggests that Momordica cyclic peptides
evolved in species that diverged from an African ancestor around 19 million years ago. The
findings provide a basis for selecting species to further investigate the biosynthetic origin of
Momordica cyclic peptides. Knowledge of the genes that encode cyclic peptides and enzymes
involved in their maturation could potentially be used for their production in suitable host
plants. In addition, this study showcases an interesting biological example of how natural
selection – as imparted by mutations – is presumably operating to acquire features essential
for cyclization of the acyclic peptides by AEP and to fine-tune the selectivity of cyclic
peptides while maintaining their structure. This knowledge may find useful application in
medicine for designing inhibitors of evolutionarily related serine proteases implicated in
human diseases using the Momordica cyclic peptide scaffold or in agriculture for designing
19
improved pesticidal agents based on cyclotides (Poth et al. 2011). In the long run, knowledge
of the biosynthesis and evolvability of Momordica cyclic peptides could translate into the
production of “designer” peptide therapeutics in plant seeds.
Materials and Methods
Seed Material
Seeds of M. anigosantha Hook.f., M. boivinii Baill., M. cabrae (Cogn.) C. Jeffrey, M.
calantha Gilg, M. camerounensis Keraudren, M. cissoides Planch. ex Benth., M. clarkeana
King, M. cymbalaria Frenzl ex. Naudin, M. denticulata Miq., M. foetida Schumach., M.
friesiorum (Harms) C. Jeffrey, M. gilgiana Cogn., M. humilis (Cogn.) C. Jeffrey, M.
jeffreyana Keraudren, M. leiocarpa Gilg, M. macrophylla Gage, M. parvifolia Cogn., M.
rostrata Zimm., M. silvatica Jongkind, M. subangulata Blume, M. trifoliolata Hook.f., and
Siraitia grosvenorii (Swingle) C. Jeffrey ex. A.M. Lu & Zhi Y. Zhang were provided by
Hanno Schaefer of the Technische Universität München. Seeds of M. balsamina L. (reference
number: 406803), M. charantia L. (reference number: 51359), and M. cochinchinensis (Lour.)
Spreng. (reference number: 69291) were purchased from B & T World Seeds sarl, Paguignan,
34210 Aigues Vives, France.
Genomic DNA Extraction, Gene Cloning, and Sequencing
Momordica seeds were dehusked and finely ground in liquid nitrogen using a mortar and
pestle. Genomic DNA was extracted using Qiagen DNEasy Plant Mini Kit following the
protocol suggested by the company. TIPTOP genes were amplified using primers JM482 (5′CGT CTT GCT AGA GAA AGG GAG T-3′) and JM483 (5′-TCA GAA ACA GCA TAG
CTT TCA C- 3′) (Mylne, et al. 2012). TIPRE sequences were amplified using primers TM P1
(5′- GAA ATG GAG AGC AAG AAG ATT CT-3′) and TM P5 (5′-AAG ATT CTA GGA
20
CAG GCT CTT TG-3′). PCR products were purified using QIAquick PCR Purification Kit
(for single band) and QIAquick Gel Extraction Kit (for multiple bands). The purified PCR
products were cloned into pGEM-T Easy (Promega) and sequenced at the Australian Genome
Research Facility. A minimum of three independent clones was used to assemble the
sequence using MacVector 12.7 software. SignalP 4.1 was used to predict the endoplasmic
reticulum (ER) recognition site in the sequence (Petersen et al. 2011). Peptide cleavage sites
were predicted based on the common features for processing and homology to previously
reported peptides (Mylne et al. 2011; Mylne et al. 2012).
Transcriptome Analysis
Transcriptome sequencing data of the African M. charantia seeds (Yang et al. 2010) was
accessed via the National Center for Biotechnology Information website
(http://www.ncbi.nlm.nih.gov). Short Read Archive (SRA) under the accession numbers
SRX030203 (normalized sequence data) and SRX030204 (non-normalized sequence data)
were used to assemble transcripts containing plant-derived cystine knot peptide sequences
(Gracy et al. 2008) with MIRA 3.4 (Chevreux 2005). A minimum of three overlapping
contigs was used to assemble the transcripts.
Peptide Extraction and Fractionation
Peptides were extracted from Momordica seeds using a method based on
acetonitrile/water/formic acid (25:24:1) as previously described (Mahatmanto et al. 2014).
Targeted peptides were fractionated using a 3 cc cartridge, 200 mg sorbent, Waters Sep-Pak®
C18 55–105 µm. Peptides were eluted with 1.5 mL of solvent B (90% v/v acetonitrile, 0.1%
v/v formic acid) in a 10% gradient. Fractions containing the majority of the targeted peptides
21
(Supplementary Figs. S2 and S3) were collected. Samples were lyophilized, redissolved in
1% v/v formic acid, and stored at 4°C until further analysis.
Ultra High Performance Liquid Chromatography (UHPLC)-Tandem Mass
Spectrometry (MS/MS)
Samples were reduced with dithiothreitol (DTT; final concentration 10 mM; incubated at
60°C for 30 minutes under nitrogen), alkylated with iodoacetamide (IAM; final concentration
25 mM; incubated at 37°C for 30 minutes in the dark), and split for overnight digestion with
trypsin (approximately 1 µg per 100 µg lyophilized sample in 100 mM ammonium
bicarbonate, pH 8.0; incubated at 37°C), chymotrypsin (approximately 2 µg per 100 µg
lyophilized sample in 100 mM Tris-HCl, 10 mM calcium chloride, pH 8.0; incubated at
30°C), and endoproteinase Glu-C (approximately 3 µg per 100 µg lyophilized sample in (i)
100 mM ammonium bicarbonate, pH 8.0 and (ii) 100 mM sodium phosphate, pH 7.8; both
incubated at 37°C). Following reduction, alkylation, and digestion, samples were analysed on
a Nexera uHPLC (Shimadzu) coupled to a TripleTOF 5600 mass spectrometer (AB SCIEX)
equipped with a duo electrospray ion source. Data were processed using Analyst® TF 1.6
software (AB SCIEX). MS/MS spectra were searched against a custom-built database of
plant-derived cystine knot peptides using ProteinPilot™ 4.0 software (AB SCIEX).
Phylogenetic Analysis
The phylogeny of Momordica is based on a slightly simplified version of the sequence dataset
of Schaefer & Renner (2010) with addition of sequences for Siraitia grosvenorii from Kocyan
et al. (2007) (Supplementary Data 1). For details on DNA sequencing, alignment and
phylogeny estimation see Schaefer & Renner (2010).
22
The evolutionary relationship between TIPTOP genes, TIPRE sequences, and the transcripts
from the African M. charantia seeds was inferred under Maximum Likelihood (ML) using the
General Time Reversible model (Nei and Kumar 2000). The initial tree for heuristic search
using Nearest-Neighbor-Interchange (NNI) method was generated automatically by applying
Neighbor-Join and BioNJ algorithms. The analysis was conducted in MEGA5 (Tamura et al.
2011). Nucleotide sequences used for this analysis are given in Supplementary Data 2.
Mapping of Sites Under Selection
Nucleotide sequences that encode the cyclic peptides were used to map sites that are under
selection using MEGA5 (Tamura et al. 2011). The numbers of synonymous (s) and
nonsynonymous (n) substitutions and the synonymous (S) and nonsynonymous (N) sites were
estimated using joint Maximum Likelihood (ML) reconstructions of ancestral states under the
Muse-Gaut (Muse and Gaut 1994) and General Time Reversible (Nei and Kumar 2000)
models. ML of codon undergoing selection was estimated through HyPhy (Kosakovsky Pond
et al. 2005) using an automatically generated Neighbor-joining tree. The probability of
rejecting the null hypothesis of neutral evolution (P-value) was calculated as previously
described (Suzuki and Gojobori 1999; Kosakovsky Pond and Frost 2005). Nucleotide
sequences used for this analysis are given in Supplementary Data 3.
Molecular Dynamics
To calculate the average structure of selected cyclic peptides against trypsin, the ordinate of
the crystal structure of native MCoTI-II bound to trypsin (PDB#4GUX) was used as reference
(Daly et al. 2013). Simulation of the complexes was performed as previously described
(Swedberg et al. 2011).
23
RNA Secondary Structure Calculation
The folding energy of TIPTOP transcripts was calculated using Mfold (Zuker 2003) with
default settings.
Accession Numbers
Sequence data from this work can be found in the GenBank database under the accession
numbers KM408418 for M. subangulata TIPTOP4, KM408419 for M. macrophylla
TIPTOP5, KM408420 for M. macrophylla TIPTOP6, KM408421 for M. anigosantha
TIPRE1, KM408422 for M. anigosantha TIPRE2, KM408423 for M. anigosantha TIPRE3,
KM408424 for M. anigosantha TIPRE4, KM408425 for M. friesiorum TIPRE5.
Supplementary Materials
Supplementary Tables S1, S2, S3, and S4; Supplementary Figs. S1, S2, and S3; and
Supplementary Data 1, 2, and 3 are available at Molecular Biology and Evolution online
(http://www.mbe.oxfordjournals.org/).
Acknowledgements
This research was supported by a grant from the National Health and Medical Research
Council (APP1009267) and the Australian Research Council (LP 130100550). D.J.C. is an
NHMRC Professorial Research Fellow (APP1026501). J.S.M. is an ARC Future Fellow
(FT120100013). J.S. is an NHMRC Early Career Fellow (APP1069819). T.M. is a recipient
of an Endeavour Postgraduate Scholarship granted by the Australian Government. The
authors thank E. Miles for comments on the manuscript, E.K. Gilding for helpful discussion,
R. Widiatmojo for photo editing of Momordica seeds, and A. Jones of the Molecular and
24
Cellular Proteomics Mass Spectrometry Facility at the Institute for Molecular Bioscience, The
University of Queensland, for support and access to the facility.
25
Figure Legends
Fig. 1. Proposed classification for currently known backbone cyclized peptides in
angiosperms. The cyclic peptide domains (coloured letters) with neighbouring residues are
given below the structures. Group 1: Cyclic peptides with three disulfide bonds that form a
knot, i.e. Cys III-VI threading a ring formed by Cys I-IV, Cys II-V, and their interconnecting
backbone. This structural motif is known as a cyclic cystine knot or CCK. The intracysteine
residues that form the loops are labeled Loops 1–6. An example member of the group, i.e.
MCoTI-II (Momordica cochinchinensis trypsin inhibitor-II, PDB#1IB9), is shown. Group 2:
Cyclic peptides with one disulfide bond, e.g. SFTI-1, (sunflower trypsin inhibitor-1,
PDB#1JBL). Cysteine connectivities are shown as yellow lines. Asparaginyl endopeptidase
(AEP) target residues, i.e. Asp/Asn, are shown with arrows. Neighbouring residues (P1′ and
P2′ sites) important for transpeptidation by AEP are shown with asterisks. Group 3: Cyclic
peptides with no disulfide bond, i.e. the orbitides, e.g. segetalin A. Cleavage sites are shown
with triangles. The peptide cyclization points is shown with black dots. (For interpretation of
the references to color, please refer to the web version of this article.)
Fig. 2. Momordica seeds used in this study. The seeds are arranged from left to right, top to
bottom, based on their phylogenetic relationship. Siraitia grosvenorii was included as a
closely related outgroup species.
Fig. 3. Precursors and sequences of Momordica cyclic peptides and their acyclic counterparts.
A. Schematic representation of the precursors for TIPTOP1 and TIPTOP5 are shown along
with a typical precursor of an acyclic peptide from M. charantia. ER: endoplasmic reticulum
signal. LP: leader peptide domain – this naming follows the recommended nomenclature for
ribosomally synthesized and post-translationally modified peptides (RiPPs) (Arnison et al.
26
2013). aa: Amino acids. B. Sequence alignment of Momordica cyclic peptides and their
acyclic counterparts. Peptides deduced from coding sequences with mass support found in
this study (asterisks) are aligned with previously reported peptides (Joubert 1984; Hara et al.
1989; Hamato et al. 1995; Hernandez et al. 2000; Mylne et al. 2012). Residues flanking the
cyclization points are shown with black dots. The six conserved cysteine residues are shown
with yellow dots. Residues are numbered from the N-terminus to the C-terminus. (For
interpretation of the references to color, please refer to the web version of this article.)
Fig. 4. Distribution of Momordica cyclic peptides and their acyclic counterparts mapped on a
Maximum Likelihood phylogeny estimate of the genus. Species analysed in this study are
shown with bold letters. Species previously studied are shown with red letters. Species
containing cyclic peptides and their acyclic counterparts are shown with coloured dots, i.e.
green for cyclic trypsin inhibitors, orange for acyclic trypsin inhibitors, and red for acyclic
elastase inhibitors. Superscript letters following the species names denote currently known
information at the nucleotide (n) or peptide (p) level. (For interpretation of the references to
color, please refer to the web version of this article.)
Fig. 5. Site selection and intermolecular hydrogen bond network mapping of Momordica
cyclic peptides. A. Sites under selection: negative (purple), neutral (white), and positive
(cyan). The sequence of TI-2 is used as a reference. The intracysteine residues that form the
loops are labeled Loop 1–6. Sites are numbered from the proto-N-terminal Gly to the proto-Cterminal Asp. P1 and P1′ respectively denotes residues on the acyl and leaving group side of
the peptide bond that would be hydrolysed by trypsin. B. Sites involved in hydrogen bonding
with trypsin. TI-1 and TI-2 were used as references for single amino acid substitutions (shown
with asterisks) based on the sequences of TI-8, TI-18 and TI-21 for TI-1 and TI-10, TI-20,
27
and TI-22 for TI-2. Intermolecular main chain and side chain hydrogen bond is shown with
coloured circles and triangles, respectively. The range of hydrogen bond frequency of
occupancy is colour-coded as per legend. (For interpretation of the references to color, please
refer to the web version of this article.)
Fig. 6. Sequence analysis for Momordica peptides. A. Alignment of the signal peptide
sequence of TIPTOP precursors. B. Free folding energy of TIPTOP transcripts calculated
using Mfold (Zuker 2003). C. Alignment of domain repeats. Used as an example here is the
leader and mature peptide domains of TIPTOP4 from the Asian M. subangulata and TIPRE4
from the African M. anigosantha. The cyclic peptide domains (shown in green, TIPTOP4A–
F) appear to be the result of insertion of a four-residue segment trailing the C-terminal and
deletion of a segment within the N-terminal of the acyclic peptide domain (shown in orange,
TIPTOP4G). TIPRE acyclic peptides acquired an extension of the C-terminal like the
TIPTOP cyclic peptides, thus links the evolution of cyclic peptides from their acyclic
counterparts. Residues flanking the cyclization points are shown with black dots. The six
conserved cysteine residues are shown with yellow dots. Cleavage sites are shown with
triangles. D. Unrooted phylogram for TIPTOP genes, TIPRE sequences, and the transcripts
from the African M. charantia seeds. E. Mutations at P1 site of the African M. charantia
peptides that led to the change from Arg to Leu. (For interpretation of the references to color,
please refer to the web version of this article.)
28
References
Aboye TL, Ha H, Majumder S, Christ F, Debyser Z, Shekhtman A, Neamati N, Camarero JA.
2012. Design of a novel cyclotide-based CXCR4 antagonist with anti-human
immunodeficiency virus (HIV)-1 activity. J Med Chem. 55:10729–10734.
Arnison PG, Bibb MJ, Bierbaum G, Bowers AA, Bugni TS, Bulaj G, Camarero JA,
Campopiano DJ, Challis GL, Clardy J, et al. 2013. Ribosomally synthesized and posttranslationally modified peptide natural products: Overview and recommendations for a
universal nomenclature. Nat Prod Rep. 30:108–160.
Austin J, Wang W, Puttamadappa S, Shekhtman A, Camarero JA. 2009. Biosynthesis and
biological screening of a genetically encoded library based on the cyclotide MCoTI-I.
Chembiochem 10:2663–2670.
Avrutina O, Schmoldt HU, Gabrijelcic-Geiger D, Le Nguyen D, Sommerhoff CP,
Diederichsen U, Kolmar H. 2005. Trypsin inhibition by macrocyclic and open-chain
variants of the squash inhibitor MCoTI-II. Biol Chem. 386:1301–1306.
Bachovchin DA, Cravatt BF. 2012. The pharmacological landscape and therapeutic potential
of serine hydrolases. Nat Rev Drug Discov. 11:52–68.
Barber CJ, Pujara PT, Reed DW, Chiwocha S, Zhang H, Covello PS. 2013. The two-step
biosynthesis of cyclic peptides from linear precursors in a member of the plant family
Caryophyllaceae involves cyclization by a serine protease-like enzyme. J Biol Chem.
288:12500–12510.
Brady K, Abeles RH. 1990. Inhibition of chymotrypsin by peptidyl trifluoromethyl ketones:
Determinants of slow-binding kinetics. Biochemistry 29:7608–7617.
Cascales L, Henriques ST, Kerr MC, Huang YH, Sweet MJ, Daly NL, Craik DJ. 2011.
Identification and characterization of a new family of cell-penetrating peptides: Cyclic
cell-penetrating peptides. J Biol Chem. 286:36932–36943.
29
Chan LY, Gunasekera S, Henriques ST, Worth NF, Le SJ, Clark RJ, Campbell JH, Craik DJ,
Daly NL. 2011. Engineering pro-angiogenic peptides using stable, disulfide-rich cyclic
scaffolds. Blood 118:6709–6717.
Chevreux B. 2005. MIRA: An automated genome and EST assembler. PhD Thesis.
Heidelberg: Ruprecht-Karls University.
Colgrave ML, Craik DJ. 2004. Thermal, chemical, and enzymatic stability of the cyclotide
kalata B1: The importance of the cyclic cystine knot. Biochemistry 43:5965–5975.
Contreras J, Elnagar AYO, Hamm-Alvarez SF, Camarero JA. 2011. Cellular uptake of
cyclotide MCoTI-I follows multiple endocytic pathways. J Control Release 155:134–
143.
Craik DJ. 2013. Joseph Rudinger memorial lecture: Discovery and applications of cyclotides.
J Pept Sci. 19:393–407.
Craik DJ, Daly NL, Bond T, Waine C. 1999. Plant cyclotides: A unique family of cyclic and
knotted proteins that defines the cyclic cystine knot structural motif. J Mol Biol.
294:1327–1336.
D’Souza C, Henriques ST, Wang CK, Craik DJ. 2014. Structural parameters modulating the
cellular uptake of disulfide-rich cyclic cell-penetrating peptides: MCoTI-II and SFTI-1.
Eur J Med Chem. (in press).
Daly NL, Love S, Alewood PF, Craik DJ. 1999. Chemical synthesis and folding pathways of
large cyclic polypeptide: Studies of the cystine knot polypeptide kalata B1. Biochemistry
38:10606–10614.
Daly NL, Thorstholm L, Greenwood KP, King GJ, Rosengren KJ, Heras B, Martin JL, Craik
DJ. 2013. Structural insights into the role of the cyclic backbone in a squash trypsin
inhibitor. J Biol Chem. 288:36141–36148.
30
Elliott AG, Delay C, Liu H, Phua Z, Rosengren KJ, Benfield AH, Panero JL, Colgrave ML,
Jayasena AS, Dunse KM. 2014. Evolutionary Origins of a Bioactive Peptide Buried
within Preproalbumin. Plant Cell 26:981–995.
Ellis RH. 1991. The longevity of seeds. HortScience 26:1119–1125.
Getz JA, Cheneval O, Craik DJ, Daugherty PS. 2013. Design of a cyclotide antagonist of
neuropilin-1 and-2 that potently inhibits endothelial cell migration. ACS Chem Biol.
8:1147–1154.
Gillon AD, Saska I, Jennings CV, Guarino RF, Craik DJ, Anderson MA. 2008. Biosynthesis
of circular proteins in plants. Plant J. 53:505–515.
Gracy J, Le-Nguyen D, Gelly JC, Kaas Q, Heitz A, Chiche L. 2008. KNOTTIN: The knottin
or inhibitor cystine knot scaffold in 2007. Nucleic Acids Res. 36:D314–319.
Greenwood KP, Daly NL, Brown DL, Stow JL, Craik DJ. 2007. The cyclic cystine knot
miniprotein MCoTI-II is internalized into cells by macropinocytosis. Int J Biochem Cell
Biol. 39:2252–2264.
Hamato N, Koshiba T, Pham T-N, Tatsumi Y, Nakamura D, Takano R, Hayashi K, Hong YM, Hara S. 1995. Trypsin and elastase inhibitors from bitter gourd (Momordica charantia
LINN.) seeds: Purification, amino acid sequences, and inhibitory activities of four new
inhibitors. J Biochem. 117:432–437.
Hara S, Makino J, Ikenaka T. 1989. Amino acid sequences and disulfide bridges of serine
proteinase inhibitors from bitter gourd (Momordica charantia LINN.) seeds. J Biochem.
105:88–92.
Hara-Nishimura I, Inoue K, Nishimura M. 1991. A unique vacuolar processing enzyme
responsible for conversion of several proprotein precursors into the mature forms. FEBS
Lett. 294:89–93.
Hedstrom L. 2002. Serine protease mechanism and specificity. Chem Rev. 102:4501–4524.
31
Hernandez JF, Gagnon J, Chiche L, Nguyen TM, Andrieu JP, Heitz A, Hong TT, Pham TTC,
Nguyen DL. 2000. Squash trypsin inhibitors from Momordica cochinchinensis exhibit an
atypical macrocyclic structure. Biochemistry 39:5722–5730.
Higaki JN, Gibson BW, Craik CS. 1987. Evolution of catalysis in the serine proteases. Cold
Spring Harb Symp Quant Biol. 52:615–621.
Hiraiwa N, Nishimura M, Hara-Nishimura I. 1999. Vacuolar processing enzyme is selfcatalytically activated by sequential removal of the C-terminal and N-terminal
propeptides. FEBS Lett. 447:213–216.
Ji Y, Majumder S, Millard M, Borra R, Bi T, Elnagar AY, Neamati N, Shekhtman A,
Camarero J. 2013. In vivo activation of the p53 tumor suppressor pathway by an
engineered cyclotide. J. Am. Chem. Soc. 135:11623–11633.
Joubert FJ. 1984. Trypsin isoinhibitors from Momordica repens seeds. Phytochemistry
23:1401–1406.
Kaas Q, Craik DJ. 2010. Analysis and classification of circular proteins in Cybase. Pept Sci.
94:584–591.
Kocyan A, Zhang L-B, Schaefer H, Renner SS. 2007. A multi-locus chloroplast phylogeny
for the Cucurbitaceae and its implications for character evolution and classification. Mol
Phylogenet Evol. 44:553–577.
Kosakovsky Pond SL, Frost SDW. 2005. Not so different after all: A comparison of methods
for detecting amino acid sites under selection. Mol Biol Evol. 22:1208–1222.
Kosakovsky Pond SL, Frost SDW, Muse SV. 2005. HyPhy: Hypothesis testing using
phylogenies. Bioinformatics 21:676–679.
Krieger M, Kay LM, Stroud R. 1974. Structure and specific binding of trypsin: Comparison
of inhibited derivatives and a model for substrate binding. J Mol Biol. 83:209–230.
32
Kurth T, Ullmann D, Jakubke H-D, Hedstrom L. 1997. Converting trypsin to chymotrypsin:
Structural determinants of S1' specificity. Biochemistry 36:10098–10104.
Li F, Zheng Q, Vandivier LE, Willmann MR, Chen Y, Gregory BD. 2012. Regulatory impact
of RNA secondary structure across the Arabidopsis transcriptome. Plant Cell 24:4346–
4359.
Li Y-D, Xie Z-Y, Du Y-L, Zhou Z, Mao X-M, Lv L-X, Li Y-Q. 2009. The rapid evolution of
signal peptides is mainly caused by relaxed selection on non-synonymous and
synonymous sites. Gene 436:8–11.
Ling M-H, Qi H-y, Chi C-w. 1993. Protein, cDNA, and genomic DNA sequences of the towel
gourd trypsin inhibitor. A squash family inhibitor. J Biol Chem. 268:810–814.
Lu W, Qasim M, Laskowski M, Kent SB. 1997. Probing intermolecular main chain hydrogen
bonding in serine proteinase-protein inhibitor complexes: Chemical synthesis of
backbone-engineered turkey ovomucoid third domain. Biochemistry 36:673–679.
Luckett S, Garcia RS, Barker J, Konarev AV, Shewry P, Clarke A, Brady R. 1999. Highresolution structure of a potent, cyclic proteinase inhibitor from sunflower seeds. J Mol
Biol. 290:525–533.
Lynch M, Conery JS. 2000. The evolutionary fate and consequences of duplicate genes.
Science 290:1151–1155.
Mahatmanto T, Poth AG, Mylne JS, Craik DJ. 2014. A comparative study of extraction
methods reveals preferred solvents for cystine knot peptide isolation from Momordica
cochinchinensis seeds. Fitoterapia 95:22–33.
Muse SV, Gaut BS. 1994. A likelihood approach for comparing synonymous and
nonsynonymous nucleotide substitution rates, with application to the chloroplast genome.
Mol Biol Evol. 11:715–724.
33
Mylne JS, Colgrave ML, Daly NL, Chanson AH, Elliott AG, McCallum EJ, Jones A, Craik
DJ. 2011. Albumins and their processing machinery are hijacked for cyclic peptides in
sunflower. Nat Chem Biol. 7:257–259.
Mylne JS, Chan LY, Chanson AH, Daly NL, Schaefer H, Bailey TL, Nguyencong P, Cascales
L, Craik DJ. 2012. Cyclic peptides arising by evolutionary parallelism via asparaginylendopeptidase-mediated biosynthesis. Plant Cell 24:2765–2778.
Nei M, Kumar S. 2000. Molecular evolution and phylogenetics: Oxford University Press.
Nguyen GKT, Zhang S, Wang W, Wong CTT, Nguyen NTK, Tam JP. 2011. Discovery of a
linear cyclotide from the bracelet subfamily and its disulfide mapping by top-down mass
spectrometry. J Biol Chem. 286:44833–44844.
Petersen TN, Brunak S, von Heijne G, Nielsen H. 2011. SignalP 4.0: Discriminating signal
peptides from transmembrane regions. Nat Methods 8:785–786.
Poth AG, Colgrave ML, Lyons RE, Daly NL, Craik DJ. 2011. Discovery of an unusual
biosynthetic origin for circular proteins in legumes. Proc Natl Acad Sci. 108:10127–
10132.
Poth AG, Chan LY, Craik DJ. 2013. Cyclotides as grafting frameworks for protein
engineering and drug design applications. Pept Sci. 100:480–491.
Rypniewski WR, Perrakis A, Vorgias CE, Wilson KS. 1994. Evolutionary divergence and
conservation of trypsin. Protein Eng. 7:57–64.
Saska I, Gillon AD, Hatsugai N, Dietzgen RG, Hara-Nishimura I, Anderson MA, Craik DJ.
2007. An asparaginyl endopeptidase mediates in vivo protein backbone cyclization. J
Biol Chem. 282:29721–29728.
Schaefer H, Heibl C, Renner SS. 2009. Gourds afloat: A dated phylogeny reveals an Asian
origin of the gourd family (Cucurbitaceae) and numerous oversea dispersal events. Proc
R Soc B. 276:843–851.
34
Schaefer H, Renner SS. 2010. A three-genome phylogeny of Momordica (Cucurbitaceae)
suggests seven returns from dioecy to monoecy and recent long-distance dispersal to
Asia. Mol Phylogenet Evol. 54:553–560.
Shewry PR, Casey R. 1999. Seed proteins. Dordrecht: Springer.
Silverman IM, Li F, Gregory BD. 2013. Genomic era analyses of RNA secondary structure
and RNA-binding proteins reveal their significance to post-transcriptional regulation in
plants. Plant Sci. 205:55–62.
Sommerhoff CP, Avrutina O, Schmoldt HU, Gabrijelcic-Geiger D, Diederichsen U, Kolmar
H. 2010. Engineered cystine knot miniproteins as potent inhibitors of human mast cell
tryptase β. J Mol Biol. 395:167–175.
Staal J, Dixelius C. 2007. Tracing the ancient origins of plant innate immunity. Trends Plant
Sci. 12:334–342.
Suzuki Y, Gojobori T. 1999. A method for detecting positive selection at single amino acid
sites. Mol Biol Evol. 16:1315–1328.
Swedberg JE, de Veer SJ, Sit KC, Reboul CF, Buckle AM, Harris JM. 2011. Mastering the
canonical loop of serine protease inhibitors: Enhancing potency by optimising the
internal hydrogen bond network. PLoS ONE 6: e19302.
Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: Molecular
evolutionary genetics analysis using maximum likelihood, evolutionary distance, and
maximum parsimony methods. Mol Biol Evol. 28:2731–2739.
Tan N-H, Zhou J. 2006. Plant cyclopeptides. Chem Rev. 106:840–895.
Thompson RC, Blout ER. 1973. Dependence of the kinetic parameters for elastase-catalyzed
amide hydrolysis on the length of peptide substrates. Biochemistry 12:57–65.
35
Thongyoo P, Bonomelli C, Leatherbarrow RJ, Tate EW. 2009. Potent inhibitors of β-tryptase
and human leukocyte elastase based on the MCoTI-II scaffold. J Med Chem. 52:6197–
6200.
Tokuriki N, Tawfik DS. 2009. Stability effects of mutations and protein evolvability. Curr
Opin Struct Biol. 19:596–604.
Trabi M, Craik DJ. 2002. Circular proteins—no end in sight. Trends Biochem Sci. 27:132–
138.
Yang P, Li X, Shipp MJ, Shockey JM, Cahoon EB. 2010. Mining the bitter melon
(Momordica charantia L.) seed transcriptome by 454 analysis of non-normalized and
normalized cDNA populations for conjugated fatty acid metabolism-related genes. BMC
Plant Biol. 10:250.
Zuker M. 2003. Mfold web server for nucleic acid folding and hybridization prediction.
Nucleic Acids Res. 31:3406–3415.
36
Group 1
MCoTI-II
P
Loop 5
RGNGYCG
VI
ICV
SDGGV
S GLoop 6 I C
*
Loop 2
4
Cleavage sites
Cyclization points
K AEP target residues
Residues important for
transpeptidation by AEP
op 1
Lo
LKKCR L
KI
II R
Loop
3
p 2 CPGA
C L
oo S D III
D
IV
Loop 5
II
V
Loop 4
VI
III
IV
Loop 3
Loop 1
Loop 6
I
AEP
...DINGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALE...
K
K
*
**
Group 3
Group 2
SFTI-I
Segetalin A
ICF
PV
G
PD R
V
AEP
PCY1
AG
...EDNGRCTKSIPPICFPDGLD...
K
*
W
CTK
SIPP
K
**
...KPQGVPVWAFQA...
Fig. 1.
37
Fig. 2.
38
Acycl
Cycl
Cycl
Cycl
Cycl
ic pe
ide
pt
pep
ic
de
ti
pep
ic
de
ti
pep
ic
de
ti
pep
ic
de
ti
A
TIPTOP1
ER
LP
TI-1
LP
TI-2
LP
TI-2
LP
TI-2
LP
TI-5
281 aa
TIPTOP5
ER
LP
TI-18
LP
TI-2
LP
TI-2
LP
TI-4
LP
TI-19
283 aa
MCTI-I
90 aa
Typical acyclic peptide precursor
B
Cyclic peptide sequence alignment
TI-1
TI-2
TI-4
TI-7
TI-8
TI-9
TI-10
TI-11
TI-12
TI-13
TI-14
TI-15
TI-16
TI-17
TI-18
TI-20
TI-21
TI-22
GGVCPKILQRCRRDSDCPGACICRGNGYCGSGSD
GGVCPKILKKCRRDSDCPGACICRGNGYCGSGSD
GGACPRILKKCRRDSDCPGACVCQGNGYCGSGSD
GGACPRILKKCRRDSDCPGACVCKGNGYCGSGSD
GGVCPKILQRCRRDSDCPGACICLGNGYCGSGSD
GGICPKILQRCRRDSDCPGACICRGNGYC--GSD
GGVCPKILKKCRRDSDCPGACICRGNGYCSSGSD
GGVCPKILKKCRHDSDCPGACICRGNEYCGSGSD
GGACPRILKKCRRDSDCPGACICRGNGYCGSGSD
GGACPKILQRCRRDSDCPSACICRGNGYCGSGSD
*
GGACPKILQKCRRDSDCPGACVCQGNGYCGSGSD
*
GGACPRILKQCRRDSDCPGACVCQGNGYCGSGSD
*
GGACPRILKQCRRDSDCPGACICQGNGYCGSGSD
*
GGACPRILKKCRRDSDCPGACVCRGNGYCGSGSD
*
GGICPKILQRCRRDSDCPGACICRGNGYCGSGSD
*
GGVCPKILKKCRHDSDCPGACICRGNGYCGSGSD *
GGVCPKILQRCRRDSDCPGACICQGNGYCGSGSD
*
GGVCPRILKKCRRDSDCPGACICRGNGYCGSGSD
5
10
15
20
Residues flanking the cyclization points
ER
LP
Acyclic peptide sequence alignment
25
30
*
TI-3
TI-5
TI-6
TI-19
TI-24
TI-25
TI-26
TI-27
CM-1
CM-3
MCTI-A
MCTI-I
MCTI-III
MCTI-II
TI-28
MCEI-I
MCEI-II
MCEI-III
MCEI-IV
EI-1
--ERACPRILKKCRRDSDCPGECICKENGYCG-----QRACPRILKKCRRDSDCPGECICKGNGYCG-----QRACPRILKKCRRDSDCPGECICQGNGYCG----QQRACPRILKKCRRDSDCPGECICKGNGYCG---- *
--QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGD *
--QRACPRILKRCSRDSDCPGACVCQDNGYCGSRGD *
--QRACPRILKRCRRDSDCPGACVCQGNGYCGSRGD *
--QRACPRILKRCRRDSDCPGACVCQDNGYCGSGGD *
---GICPRILMECKRDSDCLAQCVCKRQGYCG------AICPRILVECKRDSDCPAQCICKRQGYCG------RSCPRIWMECTRDSDCMAKCICV-AGHCG-----ERRCPRILKQCKRDSDCPGECICMAHGFCG-----ERGCPRILKQCKQDSDCPGECICMAHGFCG------RICPRIWMECKRDSDCMAQCICV-DGHCG---EEERICPRIWMECKRDSDCMAQCICV-DGHCG---*
---RICPLIWMECKRDSDCLAQCICV-DGHCG-----ERICPLIWMECKRDSDCLAQCICV-DGHCG----EERICPLIWMECKRDSDCLAQCICV-DGHCG---EEERICPLIWMECKRDSDCLAQCICV-DGHCG---EDERICPLIWMECKRDSDCLAQCICV-DGHCG---5
Conserved cysteine residues
* Peptides deduced from coding sequences with mass support found in this study
Fig. 3.
39
10
15
20
25
30
*
M. cochinchinensis n,p
M. sphaeroidea n
Cyclic trypsin inhibitors
Acyclic trypsin inhibitors
M. suringarii
M. macrophylla n,p
M. denticulata p
M. dioica
M. denudata
M. subangulata n,p
M. renigera
M. laotica
Acyclic elastase inhibitors
n Information known at the nucleotide level
p Information known at the peptide level
M. clarkeana p
M. enneaphylla
M. cissoides p
M. gilgiana p
M. anigosantha n,p
M. friesiorum n,p
M. nuda
M. pterocarpa
p
M. repens
M. corymbifera
M. spinosa
M. parvifolia p
M. multiflora
M. silvatica p
M. glabra
M. jeffreyana p
M. camerounensis p
M. trifoliolata p
M. rostrata p
M. littorea
M. cardiospermoides
M. dissecta
M. peteri
M. sessilifolia
M. kirkii
M. humilis p
M. cymbalaria p
p
M. boivinii
M. henriquesii
M. angustisepala
M. leiocarpa p
M. foetida p
M. involucrata
M. balsamina p
M. welwitschii
M. angolensis
M. charantia n,p
M. cabrae p
M. calantha p
M. obtusisepala
Siraitia grosvenorii p
0.0030
Fig. 4.
40
P1
A
5
1
G
G
Loop 6
S
D
Selection Type
K
Loop 1
I
P
C
L
Y
N
G
S
G
C
30
Loop 4
C
20
A
I
G
G
G 25
10
H-Bonding Frequency
C
CC
86.0–99.9%
72.0–85.9%
58.0–71.9%
44.0–57.9%
30.0–43.9%
16.0–29.9%
1.0–15.9%
D R
P
Loop 3
K
K
Loop 5
R
V
Negative Selection
Neutral
Positive Selection
3ƍ
D
Loop 2
R
S
15
B
TI-1
TI-2
K
I
P
G
G
S
D
C
C
G
C
R
V
A
Y
N
G
S
G
I
K
L
G
P
G
G
D
S
D
C
S
G
I
C
CC
D R
P
G
R
D
S
TI-10
K
I
P
G
G
S
D
C
S
G
C
A
Y
N
G
C
Q
V
I
K
L
G
G
Q
P
G
G
D
S
D
C
S
G
A
R
I
C
CC
D R
P
G
R
D
S
TI-20
K
I
P
G
G
S
D
C
S
G
C
A
Y
N
G
C
L
V
I
K
L
G
G
Q
P
G
G
D
S
D
C
S
G
I
C
CC
D R
P
G
H
D
S
TI-22
K
I
P
G
G
S
D
C
G
S
G
L
I
C
*
A
G
G
C
I
G
P
D
R
S
R
*
P
R
G
C
CC
D
Q
G
S
D
*
R
L
Y
N
K
K
G
S
TI-18
G
C
R
C
A
L
Y
N
G
V
R
R
D
I
P
R
C
* CC
G
K
K
G
S
TI-8
G
C
*
C
L
Y
N
S
V
R
R
D
I
P
R
C
* CC
G
K
K
G
S
TI-21
G
C
R
C
A
L
Y
N
G
V
R
R
D
I
P
R
C
CC
G
Q
C
I
S
G
C
G
G
C
R
V
A
L
Y
N
G
I
G
C
CC
D R
P
D
K
K
H
S
Fig. 5.
41
Number of Domain Repeats
B
A
MESKKILPVVLVAMMLVATSTG
MESKKILPVVLVAMMLVATSTG
MESKKILPVVLVAMMLVATSTG
MESKKILPVVLVAMMLVATSTG
MESKKILLVVLVAMMLVATSTC
MESKKILPVVLVAMMLIATSTG
MESKKILPVVLVAMMLVATSTG
Mco: M. cochinchinensis
Msp: M. sphaeroidea
C
(4A)
ER
TIPTOP
8
McTIPTOP2
TIPTOP6
MspTIPTOP2
-400
TIPTOP4
-450
McTIPTOP3
-500
(4G)
CyP
LP
7
TIPTOP5
-350
x n repeats (4B–F)
CyP
LP
6
McTIPTOP1
-300
Free Folding Energy
(kcal/mol)
McoTIPTOP1
McoTIPTOP2
McoTIPTOP3
MspTIPTOP2
TIPTOP4
TIPTOP5
TIPTOP6
5
LP
AcyP
TIPTOP4A
TIPTOP4B
TIPTOP4C
TIPTOP4D
TIPTOP4E
TIPTOP4F
FNDGDTTDLISDGRAQM--DINGG--------ACPKILQRCRRDSDCPSACICRGNGYCGSGSD
-----AIDLISDSRAQI--DINGG--------ACPKILQKCRRDSDCPGACVCQGNGYCGSGSD
-----ALDLMSDGRAQI--DINGG--------ACPRILKQCRRDSDCPGACVCQGNGYCGSGSD
-----ALDLMSDGRAQI--GINGG--------ACPRILKQCRRDSDCPGACICQGNGYCGSGSD
-----ALDLMSDGRAQI--DINGG--------ACPRILKKCRRDSDCPGACVCRGNGYCGSGSD
-----ALDLMSDGRAQI--DINGG--------ACPRILKKCRRDSDCPGACVCQGNGYCGSGSD
TIPTOP4G
-----MIDLISDGGAQTGEDINGGGVYDKRQRACPRILKKCRRDSDCPGECICQGNGYCG----
TIPRE4A
TIPRE4B
TIPRE4C
TIPRE4D
FNDGDTIDLISNDRAQTGQDINGGGVYSEEQRACPRILKRCRRDSDCPGACVCQGNGYCGSRGD
-----MIDIVLDGRAQTGEDINGSGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGSGGD
-----MIDVILDNRAQTGQDINGGGVYSEEQRACPKILKRCRRDSDCPGACVCQDNGYCGSGGD
-----MIDIVLDGRAQTGEDINGSGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGSGGD
ER
TIPRE
AcyP
LP
(4A)
AcyP
LP
Residues flanking the cyclization points
Conserved cysteine residues
Cleavage sites
x n repeats (4B–D)
MCTI-III
TI-28
D
M
CT
I-I
EI-1
TI
PR
E1
TIPRE5
TI-23
E
P1 Site
TIPRE2
TIPRE4
MCTI-I/III
TI-28
EI-1
E3
P1
TI
PR
O
PT
P4
P6
O
PT
TI
P2
TO
IP
pT
Ms
TIPTOP5
TO
TIP
I
cT
M
McTIPTOP2
Mc
TIP
TO
P3
Fig. 6.
42
AGA
CGA
CTA
Arginine
Leucine
Supplementary Tables and Figures
Table S1. New TIPTOP and TIPRE Precursors Found in Momordica Seeds.
Species
Precursor
Peptide Sequence
MESKKILLVVLVAMMLVATSTCFNDGDTTDLISDGRAQMDINGGA
M. subangulata TIPTOP4
TI-13
CPKILQRCRRDSDCPSACICRGNGYCGSGSDAIDLISDSRAQIDI
(full)
TI-14
NGGACPKILQKCRRDSDCPGACVCQGNGYCGSGSDALDLMSDGRA
TI-15
QIDINGGACPRILKQCRRDSDCPGACVCQGNGYCGSGSDALDLMS
TI-16
DGRAQIGINGGACPRILKQCRRDSDCPGACICQGNGYCGSGSDAL
TI-17
DLMSDGRAQIDINGGACPRILKKCRRDSDCPGACVCRGNGYCGSG
TI-4
SDALDLMSDGRAQIDINGGACPRILKKCRRDSDCPGACVCQGNGY
TI-6
CGSGSDMIDLISDGGAQTGEDINGGGVYDKRQRACPRILKKCRRD
SDCPGECICQGNGYCG
M. macrophylla
M. anigosantha
M. friesiorum
TIPTOP5
(full)
TI-18
TI-2
TI-2
TI-4
TI-19
MESKKILPVVLVAMMLIATSTGFNDGDTIDLISDGRAQIDINGGI
CPKILQRCRRDSDCPGACICRGNGYCGSGSDALEGLMSDGRAQID
INGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALEGLMSDG
RAQIDINGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALEG
LMSDGRAQIDINGGACPRILKKCRRDSDCPGACVCQGNGYCGSGS
DALEGLMSDAGAQTGEDINGGGVYDEKQQRACPRILKKCRRDSDC
PGECICKGNGYCG
TIPTOP6
(full)
TI-18
TI-2
TI-20
TI-21
TI-22
TI-5
MESKKILPVVLVAMMLVATSTGFNDGDTIDLISDGRAQIDINGGI
CPKILQRCRRDSDCPGACICRGNGYCGSGSDALEGLMSDGRAQID
INGGVCPKILKKCRRDSDCPGACICRGNGYCGSGSDALEGLMSDG
RAQIDINGGVCPKILKKCRHDSDCPGACICRGNGYCGSGSDALEG
LMSDGRAQIDINGGVCPKILQRCRRDSDCPGACICQGNGYCGSGS
DALEGLVSDGRAQIDINGGVCPRILKKCRRDSDCPGACICRGNGY
CGSGSDALEGLMSDAGAQTGEDINGGGVYDEKQRACPRILKKCRR
DSDCPGECICKGNGYCG
TIPRE1
(partial)
TI-24
AAALVAMMLVATSADFNGGDTIHLISNGRAQTGQDINSGGVYSEE
QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGDIIDVVFDDRAQ
ASQDINGGGVYFEE
TIPRE2
(partial)
TI-24
TI-25
AAALVAMMLVATSADFNGGDTIHLISNGRAQTGQDINSGGVYSEE
QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGDIIDVVFDDRAQ
ASQDINGGGVYFEEQRACPRILKRCSRDSDCPGACVCQDNGYCGS
RGDMIDIVLEGRAQTGQDINDGGVYSEE
TIPRE3
(partial)
TI-26
TI-27
TI-24
AAVLVATMLVATSADFNDGDTIDLISNDRAQTGQDINGGGVYSEE
QRACPRILKRCRRDSDCPGACVCQGNGYCGSRGDMIDIVLDGRAQ
IGEDINGGGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGS
GGDMIDVVLDSRAQTGQDINGGGVYSEEQRACPKILKRCRRDSDC
PGACVCQDNGYCGSGGDMIDIVLDGRAQTGEDINGGGVYSEE
TIPRE4
(partial)
TI-26
TI-27
TI-24
TI-27
AAVLVATMLVATSADFNDGDTIDLISNDRAQTGQDINGGGVYSEE
QRACPRILKRCRRDSDCPGACVCQGNGYCGSRGDMIDIVLDGRAQ
TGEDINGSGVYSEEQRACPRILKRCRRDSDCPGACVCQDNGYCGS
GGDMIDVILDNRAQTGQDINGGGVYSEEQRACPKILKRCRRDSDC
PGACVCQDNGYCGSGGDMIDIVLDGRAQTGEDINGSGVYSEEQRA
CPRILKRCRRDSDCPGACVCQDNGYCGSGGDMIDVILDNRAQTGQ
DISGGGVYSEE
TIPRE5
(partial)
TI-24
AAALVAMMLVATSADSNGGDTIHLISNGRAQTGQDINSGGVYSEE
QRACPKILKRCRRDSDCPGACVCQDNGYCGSGGDIIDVVFDDRAQ
ASQDINGGGVYFEE
Colour coding:
Endoplasmic reticulum (ER) signal sequence is shown in cyan, leader peptide (LP) domain is shown in black,
cyclic peptide domain is shown in green, and acyclic peptide domain is shown in orange. (For interpretation of
the references to color, please refer to the web version of this article.)
Table S2. Precursors Assembled from Transcriptome Sequencing Data of M. charantia Seeds.
Transcript ID number
Peptide
Precursor Sequence
MESKKIFIVVALVAMMLVASSATFEEGDMRPLVSDDGAVAGQDMNDF
1315_rep_c7201
MCTI-I
PRKMFVKVVYYENQRRCPRILKQCKRDSDCPGECICMAHGFCG
MESKKVVVVVAMVVMMLVAMSSAAFDDGGAETGEVNYYPRKMFIKIG
1315_rep_c3939
TI-28
VYNEEERICPRIWMECKRDSDCMAQCICVDGHCG
MESKKIVVVVALVAMMLVATSAAFDEGDTRPTRPLVSDDGAVVGQGM
1315_rep_c7780
MCTI-III
NDYPRKMFVKVVYYENQRGCPRILKQCKQDSDCPGECICMAHGFCG
MESKKVVVVVAMVVMMLVATSSAAFNDGRAETGEVNYPRKMFIKIGV
1315_rep_c6383
EI-1
YNEDERICPLIWMECKRDSDCLAQCICVDGHCG
MEWKKFALVAIVGMLLMGASAQAGGAETVATEIQGRPRRMMRGGICP
1315_rep_c4362
TI-23
RILMKCKKTSDCMAQCKCLSNGFCGSAPN
Colour coding:
Endoplasmic reticulum (ER) signal sequence is shown in cyan, leader peptide (LP) domain is shown in black
and acyclic peptide domain is shown in orange. The sites of N-terminal trimming are highlighted in red. No
evidence was found to support TI-23, which is shown in gray. (For interpretation of the references to color,
please refer to the web version of this article.)
Table S3. Sequences of Representative Diagnostic Peptides Identified in Momordica Seeds (Alphabetic Order)
Precursor Mass (Da)
Diagnostic Peptide Sequence
Digestion
Species
Deduced by Tandem MSa
Enzymeb
Theoretical
Observed
Ion
<QRACPRIL
M. anigosantha
995.53
995.54
498.78
C
NGYCGSRGD
984.37
984.36
493.19
E
EDERICPL
M. balsamina
1030.48
1030.47
516.24
C
RICPRIW
999.54
999.54
500.78
C
<QRACPRIL
M. boivinii
995.53
995.54
498.78
C
M. cabrae
<QRACPRIL
995.53
995.54
498.78
C
M. calantha
<QRACPKIL
984.55
984.59
493.30
C
M. camerounensis
GACPRILKK
1041.61
1041.53
521.77
T
M. charantia
EEERICPRIW
1386.67
1386.72
694.37
C
EDERICPL
1030.48
1030.49
516.25
C
DSDCPGACVCR
M. cissoides
1295.47
1295.49
648.75
T
CNSGSDGGACPKIL
M. clarkeana
1462.63
1462.64
732.33
C
<QRACPRIL
995.53
995.55
498.78
C
M. cochinchinensis CNSGSDGGVCPKIL
1462.65
1462.64
732.33
C
<ERVCPKILQE
M. cymbalaria
1252.66
1252.70
418.57
E
CGSGSDGGVCPKIL
M. denticulata
1405.63
1405.65
703.83
C
<QRACPRIL
995.53
995.55
498.78
C
M. foetida
<ERGCPRIL
981.52
981.53
491.77
C
RICPLIWQECKR
1689.84
1689.86
564.29
T
M. friesiorum
<QRACPKIL
967.53
967.51
484.76
C
M. gilgiana
CGSGKDGGACPKIL
1418.66
1418.66
710.34
C
<ERICPRIWME
1370.66
1370.70
686.36
E
<QQRACPR
M. humilis
915.43
915.45
458.73
T
<ERVCPRIL
M. jeffreyana
1023.56
1023.58
512.80
C
<ERGCPRIL
M. leiocarpa
981.52
981.54
491.78
C
RICPLIWMDCKR
1662.82
1662.84
555.29
T
CGSGSDGGICPKIL
M. macrophylla
1419.65
1419.65
710.83
C
<QQRACPRIL
1123.59
1123.60
562.80
C
M. parvifolia
<QRACPKIL
967.53
967.53
484.77
C
M. rostrata
<ERGCPRIL
981.52
981.52
491.77
C
M. silvatica
<ERRCPRIL
1080.60
1080.60
541.31
C
M. subangulata
CGSGSDGGACPKIL
1377.60
1377.59
689.80
C
<QRACPRIL
995.53
995.52
498.77
C
<ERGCPRIL
M. trifoliolata
981.52
981.51
491.76
C
GRVCPRIL
S. grosvenorii
969.55
969.54
485.78
C
a
Amino acid modifications include pyrolated glutamine/glutamate (<Q/E), carbamidomethylated cysteine (C),
dioxidized tryptophan (W), deamidated (Q), and oxidized methionine (M).
b
Enzymes used for digestion are trypsin (T), chymotrypsin (C), or endoproteinase Glu-C (E). Peptide cleavage
points are shown with arrows (). (For interpretation of the references to color, please refer to the web version
of this article.)
Table S4. Sites of Momordica Cyclic Peptides That Are Under Selection
Syn
Nonsyn
Syn
Nonsyn
Norm.
No. Codon
dS
dN
dN-dS P-value
(s)
(n)
sites (S) sites (N)
dN-dS
GGT
1
1.000
0.000
1.000
2.000
1.000
0.000
-1.000
1.000
-2.154
GGT
2
0.000
0.000
1.000
2.000
0.000
0.000
0.000
N/A
0.000
GTC
3
2.000
4.000
0.994
2.006
2.012
1.994
-0.018
0.684
-0.039
TGT
4
0.000
0.000
0.706
2.197
0.000
0.000
0.000
N/A
0.000
CCC
5
0.000
0.000
1.000
2.000
0.000
0.000
0.000
N/A
0.000
AAA
6
0.000
1.000
0.793
2.136
0.000
0.468
0.468
0.729
1.008
ATC
7
0.000
0.000
0.955
2.045
0.000
0.000
0.000
N/A
0.000
TTG
8
0.000
0.000
1.505
1.398
0.000
0.000
0.000
N/A
0.000
CAG
9
0.000
1.000
0.799
1.867
0.000
0.535
0.535
0.700
1.153
AGA
10
0.000
4.000
0.796
2.081
0.000
1.922
1.922
0.274
4.139
TGC
11
1.000
0.000
0.670
2.050
1.493
0.000
-1.493
1.000
-3.215
AGG
12
2.000
0.000
1.005
1.982
1.989
0.000
-1.989
1.000
-4.284
CGC
13
1.000
1.000
0.993
2.007
1.007
0.498
-0.509
0.890
-1.096
GAC
14
0.000
0.000
0.669
2.331
0.000
0.000
0.000
N/A
0.000
TCC
15
3.000
0.000
1.000
2.000
3.000
0.000
-3.000
1.000
-6.461
GAT
16
1.000
0.000
0.705
2.295
1.419
0.000
-1.419
1.000
-3.056
TGC
17
0.000
0.000
0.669
2.045
0.000
0.000
0.000
N/A
0.000
CCC
18
1.000
0.000
1.000
2.000
1.000
0.000
-1.000
1.000
-2.154
GGT
19
0.000
1.000
0.990
2.010
0.000
0.498
0.498
0.670
1.072
GCA
20
0.000
0.000
1.000
2.000
0.000
0.000
0.000
N/A
0.000
TGT
21
0.000
0.000
0.706
2.197
0.000
0.000
0.000
N/A
0.000
ATT
22
0.000
3.000
0.848
2.152
0.000
1.394
1.394
0.369
3.002
TGC
23
0.000
0.000
0.669
2.045
0.000
0.000
0.000
N/A
0.000
CGG
24
0.000
5.000
1.164
1.676
0.000
2.984
2.984
0.072
6.426
GGG
25
4.000
0.000
1.000
1.967
4.000
0.000
-4.000
1.000
-8.614
AAC
26
0.000
0.000
0.669
2.331
0.000
0.000
0.000
N/A
0.000
GGG
27
0.000
1.000
0.998
2.000
0.000
0.500
0.500
0.667
1.077
TAT
28
0.000
0.000
0.706
2.000
0.000
0.000
0.000
N/A
0.000
TGC
29
1.000
0.000
0.672
2.057
1.489
0.000
-1.489
1.000
-3.207
GGT
30
0.000
3.000
0.980
2.018
0.000
1.487
1.487
0.305
3.202
AGC
31
1.000
2.000
0.674
2.322
1.483
0.861
-0.622
0.871
-1.339
GGT
32
1.000
0.000
1.000
1.996
1.000
0.000
-1.000
1.000
-2.154
AGC
33
0.000
0.000
0.669
2.331
0.000
0.000
0.000
N/A
0.000
GAC
34
1.000
0.000
0.670
2.330
1.493
0.000
-1.493
1.000
-3.215
Notes:
s is the number of synonymous substitution. n is the number of nonsynonymous substitution. S is the number of
synonymous site. N is the number of nonsynonymous site. dS is the number synonymous substitutions per site
(s/S). dN is the number of nonsynonymous substitutions per site (n/N). dN-dS indicates the type of selection
operating on a codon, i.e. (-) value for negative and (+) value for positive selection. P-value is the probability of
rejecting the null hypothesis of neutral evolution. Norm. dN-dS is the normalized value of dN-dS by taking into
account the total number of substitutions in the tree.
Table S5. Inter- and Intramolecular Hydrogen Bond Network of Selected Momordica Cyclic Peptides in
Complex with Trypsin.
Main chain hydrogen bond
Side chain hydrogen bond
Cyclic
Frequency of
Frequency of
Peptide
Occupancy
Occupancy
Donor
Acceptor
Donor
Acceptor
(%)
(%)
TI-1
Gly195-Main
Lys6-Main
90.50
Lys6-Side
Ser192-Main
90.69
Lys6-Main
Ser212-Main
90.09
Lys6-Side
Ser192-Side
44.31
Leu8-Main
Phe44-Main
84.71
Lys63-Side
Gln9-Side
29.17
Ser215-Side
Gly2-Main
73.01
Ser31-Side
Ser215-Main
17.70
Gly214-Main
Cys4-Main
73.05
Asn26-Side
Thr149-Side
15.64
Cys4-Main
Gly214-Main
64.77
Lys6-Side
Asp191-Side
12.81
Gln194-Side
Pro5-Main
61.21
Arg24-Side
Ser147-Main
8.45
Ser197-Main
Lys6-Main
40.22
Lys6-Side
Gly216-Main
5.64
Gln175-Side
Gly1-Main
29.03
Ser31-Side
Gln194-Side
4.26
Lys222-Side
Asp34-Main
6.46
Arg24-Side
Thr149-Side
3.63
Gln194-Side
Cys29-Main
6.11
Thr149-Side
Asn26-Side
2.75
Ser215-Side
Gly1-Main
6.01
Asn26-Side
Tyr151-Side
2.25
Gln219-Side
Gly32-Main
5.37
Gln194-Side
Ser31-Side
1.86
Gln194-Side
Ile7-Main
4.64
Gly216-Main
Ser31-Side
1.31
Gly32-Main
Ser215-Side
2.45
Cys4-Main
Ser215-Side
1.16
Ser215-Side
Asp34-Main
1.04
TI-21
Cys4-Main
Gly214-Main
96.29
Lys6-Side
Ser192-Main
90.34
Lys6-Main
Ser212-Main
94.45
Lys63-Side
Gln9-Side
65.21
Gly195-Main
Lys6-Main
92.42
Lys6-Side
Asp191-Side
29.33
Leu8-Main
Phe44-Main
68.13
Lys6-Side
Ser192-Side
24.49
Gln194-Side
Pro5-Main
59.85
Thr149-Side
Gln24-Side
23.21
Gly214-Main
Cys4-Main
50.31
Lys6-Side
Gly216-Main
18.31
Ser197-Main
Lys6-Main
42.37
Asn26-Side
Tyr151-Side
9.85
Arg10-Main
Tyr42-Side
31.39
Gln24-Side
Thr149-Side
8.99
Ser215-Side
Gly2-Main
25.43
Asn26-Side
Thr149-Side
3.54
Gln194-Side
Ile7-Main
18.73
Gln24-Side
Tyr151-Side
1.28
Ser215-Side
Gly1-Main
13.26
Gln9-Side
Tyr42-Side
1.17
Gln194-Side
Cys29-Main
9.11
Gln175-Side
Gly1-Main
1.01
TI-8
Lys6-Main
Ser212-Main
95.97
Lys6-Side
Ser192-Main
91.59
Gly195-Main
Lys6-Main
91.15
Lys6-Side
Ser192-Side
26.73
Cys4-Main
Gly214-Main
90.53
Asn26-Side
Tyr151-Side
26.59
Gln194-Side
Pro5-Main
88.10
Gly216-Main
Ser31-Side
18.61
Ser215-Side
Gly2-Main
85.80
Lys6-Side
Gly216-Main
13.63
Leu8-Main
Phe44-Main
65.05
Tyr42-Side
Gln9-Side
9.12
Gly214-Main
Cys4-Main
60.39
Lys6-Side
Asp191-Side
7.61
Ser197-Main
Lys6-Main
45.84
Asn26-Side
Thr149-Side
6.39
Gln194-Side
Ile7-Main
15.33
Thr149-Side
Asn26-Side
3.48
Lys222-Side
Asp34-Main
9.50
Ser31-Side
Gly216-Main
1.79
Gln194-Side
Cys29-Main
6.07
Gln219-Side
Asp34-Main
2.87
Lys222-Side
Ser33-Main
2.63
Gln175-Side
Gly2-Main
1.52
Gln219-Side
Gly32-Main
1.42
Table S5. Continued
TI-18
Lys6-Main
Gly195-Main
Ser215-Side
Cys4-Main
Gln194-Side
Gly214-Main
Leu8-Main
Ser197-Main
Gln194-Side
Arg10-Main
Tyr42-Side
Gln194-Side
Gln175-Side
Ser215-Side
TI-2
Lys6-Main
Gly195-Main
Gln194-Side
Cys4-Main
Leu8-Main
Gly214-Main
Lys10-Main
Ser215-Side
Gln194-Side
Ser197-Main
Gln175-Side
Ser215-Side
Gln194-Side
Gln219-Side
TI-10
Gly195-Main
Lys6-Main
Cys4-Main
Gln194-Side
Leu8-Main
Lys10-Main
Gln194-Side
Ser215-Side
Ser197-Main
Gly214-Main
Ser215-Side
TI-20
Lys6-Main
Gly214-Main
Gly195-Main
Gln194-Side
Cys4-Main
Lys10-Main
Leu8-Main
Gln175-Side
Ser197-Main
Ser215-Side
Gln194-Side
Ser215-Side
Gln194-Side
Gly216-Main
Ser215-Side
Gly2-Main
Ser212-Main
Lys6-Main
Gly2-Main
Gly214-Main
Pro5-Main
Cys4-Main
Phe44-Main
Lys6-Main
Ile7-Main
Tyr42-Side
Leu8-Main
Cys29-Main
Gly1-Main
Gly1-Main
Ser212-Main
Lys6-Main
Pro5-Main
Gly214-Main
Phe44-Main
Cys4-Main
Tyr42-Side
Gly2-Main
Ile7-Main
Lys6-Main
Gly1-Main
Gly1-Main
Cys29-Main
Asp34-Main
Lys6-Main
Ser212-Main
Gly214-Main
Pro5-Main
Phe44-Main
Tyr42-Side
Ile7-Main
Gly2-Main
Lys6-Main
Cys4-Main
Gly1-Main
Ser212-Main
Cys4-Main
Lys6-Main
Pro5-Main
Gly214-Main
Tyr42-Side
Phe44-Main
Gly1-Main
Lys6-Main
Gly2-Main
Ile7-Main
Asp34-Main
Cys29-Main
Gly32-Main
Gly1-Main
Ser215-Side
93.89
91.30
84.97
81.82
81.77
67.52
65.71
38.41
20.42
19.83
18.75
9.32
5.70
4.73
94.11
93.85
83.55
78.19
65.95
60.75
52.47
51.03
33.35
31.76
14.91
8.60
5.26
4.09
95.69
94.43
92.87
78.63
70.87
67.13
51.91
34.68
32.82
28.73
13.17
93.69
84.80
83.48
78.67
68.39
60.79
58.24
42.77
40.58
23.07
13.67
13.07
10.21
4.89
2.22
1.09
Lys6-Side
Lys6-Side
Lys63-Side
Ser31-Side
Lys6-Side
Lys6-Side
Asn26-Side
Asn26-Side
Gly216-Main
Tyr151-Side
Gln9-Side
Ser192-Main
Ser192-Side
Gln9-Side
Ser215-Main
Asp191-Side
Gly216-Main
Thr149-Side
Tyr151-Side
Ser31-Side
Asn26-Side
Tyr42-Side
91.88
33.73
32.59
31.83
21.11
20.41
15.19
4.93
2.79
2.09
1.59
Lys6-Side
Lys6-Side
Lys6-Side
Gly216-Main
Lys6-Side
Ser31-Side
Asn26-Side
Arg24-Side
Cys217-Main
Asn26-Side
Arg24-Side
Ser31-Side
Tyr151-Side
Ser192-Main
Gly216-Main
Asp191-Side
Ser31-Side
Ser192-Side
Ser215-Main
Thr149-Side
Ser146-Main
Ser31-Side
Tyr151-Side
Ser147-Side
Gln194-Side
Tyr28-Side
90.66
39.38
33.93
28.04
24.24
22.90
20.68
11.02
5.75
4.30
3.50
3.00
1.34
Lys6-Side
Asn26-Side
Arg24-Side
Ser30-Side
Lys6-Side
Lys6-Side
Asn26-Side
Lys6-Side
Arg24-Side
Ser31-Side
Ser147-Side
Lys6-Side
Gly216-Main
Lys6-Side
Lys6-Side
Asn26-Side
Ser31-Side
Asn26-Side
Ser31-Side
Thr149-Side
Lys6-Side
Ser31-Side
Gly148-Main
Ser192-Main
Thr149-Side
Ser147-Side
Ser147-Side
Gly216-Main
Asp191-Side
Tyr151-Side
Ser192-Side
Thr149-Side
Ser146-Main
Ser30-Side
Ser192-Main
Ser31-Side
Ser192-Side
Asp191-Side
Thr149-Side
Gly216-Main
Tyr151-Side
Ser215-Main
Asn26-Side
Gly216-Main
Ser215-Side
Arg24-Side
89.92
75.91
51.95
49.43
48.77
44.07
21.81
15.03
7.20
4.41
1.15
52.66
39.09
29.82
22.34
12.96
6.92
5.75
3.63
2.41
2.07
1.26
1.05
Table S5. Continued
TI-20
Gly195-Main
Arg6-Main
Gln194-Side
Gly214-Main
Ser215-Side
Leu8-Main
Cys4-Main
Ser197-Main
Gln194-Side
Gln194-Side
Ser215-Side
Gly216-Main
Arg6-Main
Ser212-Main
Pro5-Main
Cys4-Main
Gly2-Main
Phe44-Main
Gly214-Main
Arg6-Main
Ile7-Main
Cys29-Main
Asp34-Main
Gly32-Main
90.72
89.79
84.67
68.64
65.75
58.81
44.78
30.41
25.15
17.91
1.84
1.05
Arg6-Side
Arg6-Side
Gly216-Main
Arg6-Side
Asn26-Side
Arg24-Side
Ser31-Side
Asn26-Side
Thr149-Side
Asn26-Side
Ser33-Side
Arg6-Side
Arg24-Side
Ser192-Side
Gly216-Main
Ser31-Side
Asp191-Side
Thr149-Side
Ser146-Main
Gly214-Main
Tyr151-Side
Asn26-Side
Ser147-Main
Gln219-Side
Ser192-Main
Ser147-Main
97.80
90.45
72.25
34.59
21.11
14.29
7.87
2.60
2.14
1.92
1.42
1.16
1.12
y6
y4
y5
y3
y2
y1
y11
y12
b12
y13
1260.63
y10
b11
b13
1289.56
y9
b10
1203.60
y8
1176.48
y7
b9
951.33
621.19
y6
b8
b13 b14
857.49
y5
b7
MW = 1419.65
b10 b11 b12 b13
b9
743.45
b6
564.17
470.33
b8
800.47
b7
791.30
b6
701.83
b5
y4
b5
b9
449.14
y3
b4
373.28
b7
396.15
b3
305.09
311.10
218.06
225.08
245.19
b5
b4
Intensity (%)
132.10
b2
362.11
y2
y1
100
b3
630.36
645.28
678.21
b2
1116.57
y7
1059.55
y8
1048.38
y9
CCAM G S G S D G G I CCAM P K I L
972.52
y13 y12 y11 y10
A
Prec.
710.832+
*
400
500
600
y7
B
700
y4
y6
800
y3
y2
900
b5
b6
1200
1300
m/z, Da
MW = 967.53
b7
y6
499.21
b5
596.26
b4
470.33
429.25
y7
b7
373.28
y3
b6
419.22
y4
b3
362.68
268.14
b4
Intensity (%)
245.19
250.11
b5
298.63
132.10
b2
b4
b3
339.18
y2
y1
100
b2
1100
y1
<Q R A CCAM P K I L
b1
1000
b6
b7
837.44
300
724.36
200
701.40
100
* Prec.
484.772+
300
400
y8
C
y7
500
y6
y5
y4
600
y3
y2
m/z, Da
MW = 984.36
y6
651.25
y5
y4 b8
900
b8
491.22
407.66
b4
434.20
436.17
326.13
y7
335.14
347.17
y6 y3
b3
b4
y7
y8
b4
* Prec. 493.19
200
300
400
y7
D
2+
500
y6
y5
y4
600
y3
y2
b5
y7
b4
Intensity (%)
100
200
Fig. S1. Continued
900
m/z, Da
800
900
m/z, Da
MW = 1123.60
b8
y5
b5
b8
627.27
b7
b7
y4
234.12
b4
b4
497.26
498.34
401.29
396.20
240.10
245.19
132.10
100
b3
y3
b3
440.71
443.26
467.24
b2
y2
b2
800
y1
<Q Q R A CCAM P R I L
y1
700
y6
729.41
100
658.37
Intensity (%)
495.17
248.09
172.07
134.04
y2
b2
191.06
y1
100
b3
800
y1
N G Y CCAM G S R G D
b2
700
871.34
200
814.31
100
Prec.
562.802+
*
300
400
500
600
700
y5
y6
y4
y3
y2
y1
y6
600
700
y5
*
y3
y4
800
y2
b2
b4
b6
1200
1300
m/z, Da
MW = 1030.49
y5
b4
643.30
b5
530.22
502.27
b7
1100
b7
y4
b6
450.70
322.16
b5
402.17
b3
374.12
389.19
229.15
245.08
y3
b2
b5
1000
y7
y6
b6
b7
Intensity (%)
115.08
y2
132.10
y2 y1
b3
900
y1
E D E R I CCAM P L
100
844.45
731.37
Prec.
685.362+
900.39
902.44
y7
639.31
b9
500
F
b9
787.41
400
300
b7
803.33
200
y6
b6
658.37
100
MW = 1368.70
b8 b9
y5
b5
565.30
571.34
b4 b8
b7
583.29
474.28
500.78
y3 y7
b3
b5
b6
799.34
b4 b5
y8
y4
526.23
526.75
y2
370.12
205.10
y4
b3
Intensity (%)
241.08
b2
318.18
320.16
y1
100
286.17
b2
1165.58
y7
E E E R I CCAM P R I W
896.39
y8
E
Prec.
516.252+
*
y10
y9
y8
600
y7
y4
700
y3
100
200
300
400
500
600
700
b6
y7
y8
1053.44
Prec.
602.633+
y8 *
b5
MW = 1804.88
b10
y11
753.44
y4
b9
y10
768.85
b6
688.84
b4
b5
640.36
b4
527.28
418.24
b3
430.22
b6
377.23
b5
320.68
b4 b2
Intensity (%)
215.62
b3
264.14
270.19
290.15
100
b3
578.27
583.77
b2
y3
m/z, Da
900
y2
R I CCAM P L I W2OX Q E CCAM K R D
y2
800
800
900
1000
y9
1100
b9
1200
1300
y10
b10
1376.66
y11
G
500
1388.65
400
1279.61
300
1228.61
200
1166.53
100
m/z, Da
Fig. S1. Representative tandem MS spectra of diagnostic peptides corresponding to Momordica cyclic peptides
and their acyclic counterparts. Shown above are tandem MS spectra of diagnostic peptides corresponding to TI18 from M. macrophylla (A), TIPRE peptides from M. anigosantha (B, C), TI-19 from M. macrophylla (D), TI28 from M. charantia (E), EI-1 from M. charantia (F), and an elastase inhibitor from M. foetida (G). Precursor
mass is shown with an asterisk. Series of b- and y-ions were used to deduce the sequence. Green and red
indicates a 2+ charge state of the b- and y-ion, respectively. Modifications of amino acid were observed, i.e.
pyrolated (<), carbamidomethylated (CAM), and dioxidized (2OX). (For interpretation of the references to color,
please refer to the web version of this article.)
Relative Abundance (%)
Unfractionated
TI-1
TI-2
TI-4
TI-5
TI-6
TI-7
TI-8
Intensity (cps)
4.0e5
Fraction 1:
10–20% Solvent B
Relative Abundance (%)
Fraction 2:
20–30% Solvent B
Relative Abundance (%)
Fraction 3:
30–40% Solvent B
Relative Abundance (%)
Fraction 4:
40–50% Solvent B
Relative Abundance (%)
TI-1
TI-2
TI-4
TI-5
TI-6
TI-7
TI-8
Intensity (cps)
3.9e5
Intensity (cps)
2.2e5
TI-1
TI-2
TI-4
TI-5
TI-6
TI-7
TI-8
Intensity (cps)
3.4e5
5
10
93.6 ± 1.7
79.8 ± 2.1
93.4 ± 5.5
97.2 ± 3.6
97.1 ± 2.0
92.4 ± 3.4
75.2 ± 2.1
4.2 ± 0.6
1.2 ± 0.0
0.1 ± 0.0
BT
0.1 ± 0.0
0.1 ± 0.0
18.0 ± 1.7
BT
0.2 ± 0.0
BT
BT
BT
BT
4.9 ± 1.0
TI-1
TI-2
TI-4
TI-5
TI-6
TI-7
TI-8
Intensity (cps)
1.4e5
93.9 ± 3.0
79.7 ± 2.5
90.0 ± 2.0
94.0 ± 3.6
96.0 ± 3.3
91.1 ± 3.2
75.4 ± 1.8
TI-1
TI-2
TI-4
TI-5
TI-6
TI-7
TI-8
15
20
25
30
35
40
45
50
2.2 ± 0.3
18.8 ± 0.8
6.5 ± 0.3
2.8 ± 0.1
2.8 ± 0.2
7.5 ± 0.3
1.9 ± 0.2
55
60
min
70
Fig. S2. Elution profiles of unfractionated and fractionated M. cochinchinensis seed extracts. The majority of the
targeted peptides (coloured dots; the small dots represent the ß-aspartyl isomers) are present in fraction 10–20
solvent B. The relative abundances of the targeted peptides as inferred from their peak intensities are given in the
right hand box. BT: Below threshold (10 cps). For details on the calculation see Mahatmanto et al. (2014). (For
interpretation of the references to color, please refer to the web version of this article.)
y3
y2
y1
b4
b7
b5 b6
y7 y4
y5
b4
584.28
475.76
b6
MW = 1080.59
b7
485.79
498.34
y6
y3
b5
401.29
658.37
b4
292.64
212.62
245.19
y5
b2
268.14
132.10
y2
b3
681.32
y1
100
b3
407.74
419.22
b2
y6
b5
814.47
y4
681.32
y5
<E R R CCAM P R I L
658.37
y6
y7
A
424.24
Intensity (%)
b3
Prec.
541.302+
*
y8
B
500
y7
y6
600
y3
y2
b5
b6
b7
MW = 1275.59
y6
b5
b6
796.46
430.22
b8
683.38
398.73
b4
527.28
342.19
b7
565.29
b3
481.18
b6
491.77
b5
m/z, Da
900
b8
y3
295.10
b2
270.19
241.09
215.62
b4
b4
Intensity (%)
148.06
y2
y3
b3
264.14
y1
100
b3
800
y1
R I CCAM P R I W MOXI E
b2
700
y7
b7
y8
1120.53
400
1007.44
300
982.54
200
847.41
100
Prec.
638.802+
*
200
300
400
500
y7
C
600
y6
y5
y4
y3
700
y2
b6
1100
m/z, Da
MW = 981.51
y5
b5
658.37
b4
1000
b7
582.25
498.34
369.68
b7
b5
y4
485.19
325.16
Intensity (%)
b4
y7
426.22
b6
401.29
b3
245.19
249.67
y4 b
2
268.14
132.10
100
b3
y3
436.25
b2
y2
900
y1
<E R G CCAM P R I L
y1
800
y6
715.39
100
Prec.
491.762+
*
100
200
300
400
500
600
700
800
900
m/z, Da
Fig. S3. Representative tandem MS spectra of diagnostic peptides present in M. charantia seed extract fraction
20–30 solvent B. Shown above are tandem MS spectra of diagnostic peptides corresponding to MCTI-I (A),
MCTI-II (B), and MCTI-III (C). Precursor mass is shown with an asterisk. Series of b- and y-ions were used to
deduce the sequence. Green and red indicates a 2+ charge state of the b- and y-ion, respectively. Modifications
of amino acid were observed, i.e. pyrolated (<), carbamidomethylated (CAM), and oxidized (OXI). (For
interpretation of the references to color, please refer to the web version of this article.)