Introduction

Benzoin is the resin secreted from the trunk of the benzoin plant and is used as a medicine1,2. The main chemical components of benzoin are balsamic acids, lignans, terpenes, steroids, and other compounds3,4,5,6. Pharmacological studies have shown that benzoin possesses anti-inflammatory, antipyretic, antitumor, analgesic and estrogen-promoting properties, protects against cerebral hypoxia, and promotes blood–brain barrier permeability, among other properties7,8,9,10. Approximately 130 species of the genus Styrax Linn., both trees and shrubs, are distributed worldwide, mainly in Asia (China), Southeast Asia (Vietnam, Thailand, Sumatra, Indonesia) and North America11,12.

During the Tang Dynasty in ancient China, benzoin was recorded as a spice in the Book of Jìn and as a medicine in the Tang Materia Medica (Xīnxiū Běn Cǎo)13,14. It has a history of more than 700 years of medicinal use and was used in approximately 93 prescriptions, such as in zhìbǎo boluses and storax pill in the Tàimín Huìmín and Pharmaceutical Bureau Formula and in dàhuóluò boluses in Sages' Salvation Records15,16. Benzoin has well-established uses in traditional forms of medicine. Several national pharmacopoeias, including those from China, the United States, Europe, and India, describe the specifications and tests for benzoin. The Chinese medicinal records identify benzoin as derived from the species Styrax tonkinensis (Pierre) Craib ex Hart., whereas S. tonkinensis and Styrax benzoin Dryander are recorded in the European Pharmacopoeia standard. The United States Pharmacopoeia and the Indian Pharmacopoeia record benzoin as derived from S. benzoin, S. paralleloneurus Perkins (trade name Sumatran benzoin), S. tonkinensis, or other species of the genus Styrax Linn. (trade name Siam benzoin)17,18,19,20,21. Benzoin is used in the form of a tincture22. The British Pharmacopoeia specifies the use of Sumatra benzoin in Benzoin Inhalation and Compound Benzoin Tincture23. The United States Pharmacopoeia also describes a compound benzoin tincture, although it does not specify which type of benzoin is to be used. The Swiss Pharmacopoeia and the Indian Pharmacopoeia describe a simple benzoin tincture using Siam benzoin. Benzoin is also used in other medicinal preparations (both official and proprietary), such as over-the-counter cough suppressants, cold and flu preventatives, lotions, mouthwashes, and antibacterial powders24. It is likewise used as an additive in aromatherapy25. In addition, benzoin is also widely used in food, flavoring, and daily chemical products. The main role of benzoin in food is as a flavoring agent, and it is used to create the flavor of chocolate. In Denmark and Switzerland, benzoin is used as a flavoring agent for baked goods, which can fix other flavors and add spiciness26,27.

Benzoin is approved for use in food in the United States. As with all food additives, benzoin is subject to periodic review by the Joint FAO/WHO Expert Committee on Food Additives to assess its safety28. In addition, the largest use of benzoin, in terms of comparative quantities, is for incense substitutes in religious ceremonies and for the manufacture of spices. These fragrances are then blended and used in a wide range of end products, such as personal hygiene and care products and household products5,29. Siamese benzoin produces a lighter-colored extract than Sumatran benzoin for use in the fragrance, flavor, and pharmaceutical industries to prepare resins, tinctures, or other types of extracts30. Therefore, we infer that species differences in benzoin may influence its traditional use.

The resin of one or various plants in the genus Styrax Linn. has been used as "benzoin", such as in the United States Pharmacopoeia collections. In China, this medicinal material was mainly imported from abroad through the Silk Road, a practice that continues to this day. Currently, the true identity of the benzoin species available in the Chinese domestic herbal market is uncertain, and the classification of medicinal plants and spices in general is unclear. In addition, in the North Sumatra region of Indonesia, the benzoin resin production process involves first removing shrubs and parasitic plants from around the benzoin tree, tapping the bark, and then cutting the opening with a knife. After an interval of 3–4 months, the resin is cut, usually in summer and autumn, dried for 1–2 weeks, and finally cleaned (removal of excess bark). The collected benzoin is pooled together, sorted according to quality, and sold31. The whole process is not strictly standardized and confusion between different species of benzoin resin is evident32. Therefore, a method for the identification of benzoin species must be established.

In recent years, research on benzoin quality has mainly been based on the development of a chemical-based volatile component analysis and aromatic lipid content determination methods, such as gas chromatography-mass spectrometry, solid phase microextraction, headspace sampling, and high-performance liquid chromatography frit fast atom bombardment mass spectrometry33,34. In addition, the existing quality standards for medicinal materials record benzoin quality control indicators, including identification, total ash content, loss-on-drying, alcohol-insoluble matter and benzoic acid contents. These conventional indicators are mainly used for benzoin quality conformity testing, but they do not identify the species of benzoin.

DNA barcoding has received increasing attention as a rapid and accurate method of species identification that compares a standard DNA fragment from the species of interest with a library of DNA barcodes of known taxonomy. It is characterized by the fact that only one or a few suitable gene segments are selected to accurately identify most species of the entire genus and family, and it has high repeatability and stability35,36,37. Chen et al.38 established a DNA barcode identification system for traditional Chinese medicinal materials. The ITS2 sequence is short in length and easy to amplify. It has been widely used in systematic research as a part of ITS and has been confirmed to have the highest resolution in DNA barcodes38. However, at present, molecular biology studies examining benzoin are mainly focused on the chromosome size, ploidy, chloroplast genome, phylogeny, and geographic evolution39,40,41,42.

The key to species identification using molecular diagnostic approaches is to extract DNA from the target. Previous reports have not addressed the species identification of benzoin resinous medicinal materials. Benzoin is a resinous medicinal material secreted by injured Styrax trees. Theoretically, these resins do not contain the tissues of the benzoin tree, and DNA is difficult to extract. However, when investigating the actual status of commercially available benzoin, we fortunately found that the resin contains a bark-like residue. This residue may be the bark of the Styrax tree. This residue is produced using a procedure similar to the process of amber formation, but the resin is not completely lithified, and thus it is called semipetrified amber43,44,45. However, these residues stick to the resin, and the DNA is not easily extracted. Therefore, the DNA extracted from these residues must be explored and then ITS2 primary and secondary structures should be used as molecular diagnostic techniques to identify the species. This study provides a new method to solve the problem of identifying species producing the semipetrified amber benzoin medicinal material by obtaining original information on these residues. This report is the first to show that the correlation of different information may help determine the origin of benzoin.

Results

DNA concentration, sequence analysis and species identification

We anticipated the direct extracting of DNA from crushed benzoin (resin) samples after sampling in this investigation, but the extracted DNA concentration was insufficient for subsequent tests (results not shown). Thus, we investigated scraping bark-like residues off the benzoin resin with a knife and treating each residue as a separate sample with molecular diagnostic identification, which has the advantage of preventing sequencing failure in subsequent trials due to sample mixing (Fig. 1).

Figure 1
figure 1

Process of pretreatment and molecular diagnostic identification of bark-like residues on benzoin.

In total, 27 batches of benzoin samples containing 40 individual bark-like residues were analyzed in this study. The DNA concentrations obtained from these residues ranged from 8.0 to 203.8 ng/μL, and the DNA purity ranged from 1.70 to 2.08. Although the AX151 sample had the lowest DNA concentration of 8.0 ng/μL, it had a purity of 1.89 and was able to be successfully amplified by PCR (Table 1). Therefore, treating the samples with 95% ethanol avoided the effect of resin on sample DNA extraction and amplification. In addition, because the samples originated from bark-like residues in benzoin resin, whose DNA was degraded, even though the DNA concentration and purity satisfied the requirements for subsequent experiments, we chose primers for the nuclear gene ITS2 as the universal barcode as primers for amplification, and all DNA samples successfully amplified and sequenced. The complete ITS2 sequence was obtained by removing the 5.8S and 26S regions from the sequences obtained using sequencing, which had a length of 217–230 bp.

Table 1 DNA concentration and purity of samples, and ITS2 length and GC content after sequencing.

Forty individual samples were identified by a BLAST comparison with GenBank data from species of the genera Styrax Linn., Dimocarpus Lour., Aquilaria Lam., Ageratum L., Glycine Willd., Acronychia J. R. Forst. & G. Forst., Trema Lour., and Musa L., with a BLAST species identification rate > 92% (Table 2). The BLAST and secondary structure analyses showed that 30 samples were derived from the genus Styrax Linn., of which 16 were identified as S. tonkinensis and 14 were identified as S. japonicus. In addition, 10 samples were derived from other genera. The results from the BLAST search and secondary analysis results of sample AX174 were different, but both indicated that the sample belongs to the Trema Lour species. Samples AX071 and AX151 were identified as Aquilaria sinensis (Lour.) Spreng., AX102 and AX215 as Dimocarpus longan Lour., AX172 and AX201 as Acronychia pedunculata (L.) Miq., and AX171 as Glycine soja Siebold & Zucc. and Glycine max (Lour.) Merr. AX122 and AX231 were identified as Ageratum conyzoides L. and Musa acuminata Colla, respectively. These species are mainly found in tropical and subtropical regions, consistent with benzoin production areas. Benzoin is resinous, and it is inferred that the tissue of these plants may have been mixed with the harvested benzoin.

Table 2 Results of BLAST comparison and secondary structure analysis of ITS2 sequences of samples.

We assigned the 14 samples identified by BLAST as S. japonicus to group 1 and the 16 samples identified as S. tonkinensis to group 2. Twenty base variable loci were identified by ITS2 sequence alignment with S. japonicus and S. tonkinensis species in GenBank (Table 3). Base sites 13, 24, 46, 75, 78, 102, 104, 126, and 219 can be used as specific identification sites for these two species. However, different base sites were identified in the collected samples from these two species, such as G at base site 7 and C at base site 189 in group 2, while S. japonicus and S. tonkinensis species downloaded from GenBank had A and T for these base sites, respectively. In addition, two base sites in group 2 were identical to those of S. japonicus (base sites 32 and 209), and two base sites were variable (base sites 60 and 185).

Table 3 BLAST comparison of base difference sites of ITS2 sequences from Styrax Linn. samples with S. japonicus and S. tonkinensis.

Analysis of the ITS2 secondary structure

The ITS2 database was used to predict the homology of the secondary structures of 48 major species of the genus Styrax Linn. The ITS2 secondary structures of all species were folded into the typical structure of a central loop with four helices (I, II, III, and IV), with helix III identified as the longest helix, followed by helices I, II, and IV. Helix II was the most conserved, with the central main loop relatively conserved and rich in purine bases46,47,48. The ITS2 secondary structure predictions for 40 individual samples are shown in Table 4. By homology prediction, 30 samples were derived from species of the genus Styrax Linn., 16 of which were identified as S. tonkinensis and 14 as S. japonicus. In addition, 10 samples were derived from species of other genera. The ITS2 secondary structures of the two species identified as S. tonkinensis and S. japonicus in the benzoin samples are shown in Fig. 2A. The helix positions and angles of the secondary structures of these two species are completely different, with helices I and III identified as more variable due to multiple unpaired bases and positional differences. In contrast, the shapes of helices II and IV, although basically the same, also contain unpaired bases, leading to differences. S. tonkinensis contains three slightly different secondary structures, as shown in Fig. 2A a), b), and c). The red dashed boxes represent the base pairs between the base of helix III and the first unpaired loop on the helix arm of S. tonkinensis a) and S. tonkinensis b), and these differences are due to differences in the position of helix III. S. tonkinensis contains three ITS2 secondary structures with different base sites, such as bases A, C, and T at position 32 in helix I (black arrows). In addition, hemi compensatory base changes (hCBCs) were identified in helix III (109/179: C-G → T-G, red arrows). We compared the DNA sequences of the two sets of basal samples obtained using BLAST identification. The ITS2 secondary structure sequence in group 1 was identical to that of S. japonicus, with the presence of hCBCs (73/84: G-C → G-T) detected only on helix II (Table 4). The results for the predicted secondary structure of ITS2 in group 2 homologs were the same as those of S. tonkinensis, but the differences in the position of helix III were clear, and variations in base sites were also observed. For example, sample AX051 differed from the other 15 samples by the secondary structure at base 60 in the main loop (A) and the presence of hCBCs in helix III (103–185: C-C → C-G), but the remaining sites were identical. Compared with S. tonkinensis a) and S. tonkinensis b), the bases at sites 7 and 60 of the main loop in group 2 are G, the bases at site 32 of the top asymmetric loop in helix I are T, the bases at site 77 of the top asymmetric loop of helix II are C, while the base 100/188 of helix III lacks the A-T base pair and has hCBCs (103–185: C-C → C-G), and base 209 of the top asymmetric loop of helix IV is A. Compared to S. tonkinensis c), the sample in group 2 have G bases at sites 7 and 60 of the main loop, T-T bases at sites 32–33 of the top asymmetric loop of helix I, different base pairs at 100–106/182–188 of helix III, and the presence of hCBCs on helix IV (200–215: T-G → C-G), as shown in Table 4. Therefore, the secondary structures of ITS2 from S. tonkinensis and S. japonicus differ significantly, and the species may be further precisely identified based on this information. In addition, the ITS2 secondary structures from the remaining 46 species of the genus Styrax Linn. are shown in Fig. 2B. The sizes, positions, and angles of the four helices and the bases in the helix vary among the 38 species, except for the four species pairs of S. calvescens and S. formosanus, S. formosanus and S. dasyanthus, S. formosanus and S. confusus, and S. confusus and S. hemsleyanus. The primary sequences in these four groups of species are identical to each other. We tried to analyze the differences in their secondary structures by measuring homologous folding, but unfortunately, the ITS2 secondary structures of the four groups of species were also identical.

Table 4 Comparison of base differences on the helices of S. tonkinensis and S. japonicus with the samples.
Figure 2
figure 2

(A) Comparison of the secondary structure of ITS2 between benzoin samples from S. tonkinensis and S. japonicus. S. tonkinensis secondary structures a), b), and c) were all slightly different. Arrows indicate the sites of hCBCs (red) and different base sites (black), and the red dashed boxes represent the different base pairs in helix III of S. tonkinensis. (B) The secondary structures of ITS2 in 46 taxa analyzed in this study, except for S. tonkinensis and S. japonicus. The species name is marked below each structure.

The species of the 27 batches of benzoin samples collected were inferred from a comprehensive BLAST analysis and by analyzing the ITS2 secondary structures of 40 individual samples, as shown in Table 5. The AX01 sample contained two species of the genus Styrax Linn., S. japonicus and S. tonkinensis. No species of the genus Styrax Linn were detected in the four batches of samples (AX15, AX20, AX23, and AX25), and only other plant species were detected, but this result does not mean that these samples were not benzoin. The contamination of commercially available benzoin samples was prominent, accounting for 29.6% of the samples, which is related to the harvesting and transportation processes at the site of origin.

Table 5 Comprehensive determination of species in 27 batches of benzoin samples.

Discussion

The core of Chinese medicine identification is to conduct variety authenticity and quality evaluation, which are related to drug safety and the quality assurance of the Chinese medicine industry. Traditional identification, as represented by the application of classical morphological classification to study the origin of Chinese herbs, is based on the description of individual traits and macroscopic observation, and the conclusions it yields are often imperfect49. This limitation is attributed to the need for specialized taxonomic knowledge, which is mostly based on an empirical judgment. In addition, microscopic identification, physicochemical identification, and other mainstream methods for the identification of Chinese medicinal materials have some limitations in the identification of genera with various species. For example, microscopic identification requires specialized expertise, and physical and chemical identification is time-consuming50. Molecular diagnostic technology, as represented by DNA barcoding, can accurately identify and characterize species and is a simple and precise method for the identification of Chinese medicinal materials with clear judgment criteria; it is thus suitable for the accurate identification of Chinese medicinal materials and their various species without requiring professional knowledge of taxonomy or ambiguous morphological characters36.

Most Chinese herbs and tablets undergo further processing and preparation steps, where DNA degradation is more severe, particularly for resinous herbs such as benzoin, which is formed by the solidification of secretions after the injury of trees of the genus Styrax Linn., and these resins mainly contain balsamic acids, lignans, and terpenoids1,5,51,52,53. Despite the original processing of the benzoin resin, bark or plant tissue fragments remain on the resin. However, these resins are insoluble in water, which increases the difficulty of DNA extraction. Therefore, the most critical step for molecular diagnostic techniques is to extract higher quality DNA from the samples. In preliminary experiments, benzoin was directly ground to extract DNA (the purity and concentration of most of the DNA samples were low and did not meet the experimental requirements), but the amplification results were unsatisfactory (most of the amplifications failed, and some sequences were heterozygous). This failure might be due to low DNA yield because of lower retention or resin inhibition. Therefore, we changed our approach and collected the bark-like residues directly from the benzoin resin and used each piece of residue as a separate sample. These bark-like residues were treated with 95% ethanol, and fortunately, DNA was successfully extracted and purified from them using a plant kit. The purpose of the 95% ethanol treatment is to remove as many of the substances that interfere with DNA release as possible (including other impurities). Previous scholars have mainly explored methods for extracting DNA extraction methods from dried wood (Lobeliaceae), archaeological wood remains, ancient wood (oak), endangered wood (prismatic wood), and other samples. Although the DNA of wood may be severely degraded, DNA can be extracted through method optimization54,55,56,57. To date, resinous medicinal materials have been studied in Draconis sanguis (resin exuded from fruits) and agarwood (wood-containing resin) by improving DNA extraction methods58,59. The present study is also the first to report the successful extraction of DNA from resin samples exuded from excised trees that meets the requirements for subsequent experiments. Therefore, this study shows that resinous materials used for DNA extraction must be pretreated.

In recent years, DNA barcodes have been widely used in species identification studies of medicinal materials, and the candidate barcodes are mostly the chloroplast markers matK, rbcL, psbA-trnH, trnL, and trnK, the nuclear genes ITS and ITS2, and their corresponding barcode combinations38,60,61,62. However, for some samples with severe DNA degradation and fragmented DNA, the use barcodes with longer amplification products as primers is unsuccessful. The selection of barcodes with appropriate PCR amplification product lengths is crucial for molecular diagnostic studies, especially for DNA degradation or fragmentation in medical materials such as resinous or stemmed wood. ITS2, a nuclear gene sequence, is generally 200–300 bp in length and is suitable for identification at the genus and species levels. Moreover, the secondary structure is more conserved, which is a neck-loop structure formed by the single-strand DNA folding back on itself, with paired bases forming the stem and unpaired bases forming the loop, and a mature ITS2 database and structure prediction software are available63,64. In species identification studies, the prediction of secondary structure may be a useful complement to the primary structure phylogenetic tree by avoiding or excluding the misleading effect of paralogous homologs or pseudogenes on the primary structure65. Although the assessment of the ITS2 secondary structure was an early technique used for molecular diagnostics, it is still valid for species identification as a complement to DNA barcoding.

In the present study, 40 individual bark-like residues from 27 batches of benzoin were analyzed by performing BLAST searches of their ITS2 primary structure and secondary structure, and 30 were found to be from the benzoin genus and identified as S. tonkinensis and S. japonicus. Ten samples were identified as species of other genera, perhaps possibly related to mixing during resin collection and transport at the original cultivation site. However, uncertainty also exists about the results of identifying species of other genera in benzoin samples because only ITS2 was used as the barcode sequence in this study, and the reliable information obtained was limited. This study only presents the possibility of this potential contamination. Therefore, some additional primers (e.g., sequences of the chloroplast genome) are needed at a later stage to obtain more genetic information and improve the reliability of the identification results. In addition, morphological, taxonomic, or anatomical methods must be combined with comparative studies of the tissues of these species growing in the field to more accurately confirm the species information contained in the bark-like residues.

In this study, using the GenBank database, we compared the ITS2 sequences of the two species identified as benzoin, and 9 base sites were useful as specific identification sites for these two species. Therefore, this study is expected to develop specific DNA probes for the accurate identification of species of the genus Styrax Linn., which provides a good basis for determining the authenticity of benzoin products. Overall, this study provides a new method to solve the problem of species identification of the semipetrified amber medicinal resin benzoin by obtaining original information on these residues.

Materials and methods

Sample collection

Twenty-seven batches of benzoin medicinal materials were collected from medicinal material markets and drugstores in China. These collections are permitted and legal. All benzoin samples were identified by Professor Yangyang Liu of the Hainan Branch of the Institute of Medicinal Plants, Chinese Academy of Medical Sciences. We observed the visual properties of the collected benzoin samples and found that the samples were irregular and often agglomerated into clumps. The surface was orange–yellow or yellowish white, and the material was fragrant, with a gritty feel when chewed. It was very brittle and could be broken or cracked by hand; and the resin contained bark residues inside. Two to four replicates were added during the experiment for batches with high bark residues. Therefore, 40 individual bark-like residues from the final experimental materials were obtained, and the details are shown in Table 6. All voucher specimens of bark-like residues are maintained in the herbarium of the Aromatic Southern Medicine Identification Center, Hainan Branch, Institute of Medicinal Plants, Chinese Academy of Medical Sciences. In addition, 56 ITS2 sequences of 48 species of Styrax Linn. were downloaded from GenBank for comparative analysis (Table 7).

Table 6 Information on samples of benzoin was collected from medicinal markets and drugstores.
Table 7 ITS2 sequences of species of the genus Styrax Linn. in Genbank.

DNA extraction, amplification, and sequencing

A knife was used to scrape and collect the bark-like residue in the benzoin resin, which was placed in disposable petri dishes. An appropriate amount of 95% ethanol was added twice and incubated for 5 min each, and the residue was finally washed with sterile water and dried (Fig. 1). The sample was cut into pieces and placed in a 2 mL centrifuge tube to which two steel balls were added. Then, it was frozen in liquid nitrogen and ground in a high-throughput tissue grinder for 5 min. DNA was extracted with the modified HP Plant DNA kit (OMEGA Bio-tek), where 1000 μL of buffer CPL and 10 μL of β-mercaptoethanol were added, and the remaining steps were performed according to the manufacturer’s instructions. DNA samples were purified using the MicroElute® DNA Clean-Up Kit. The concentration of total DNA was measured, the DNA purity (the ratio of A260 to A280) was calculated, and the extracted DNA was stored in a − 20 °C freezer until use. Amplification was performed using primers for ITS2 (ITS2-F: ATGCGATACTTGGTGTGAAT; ITS2-R: GACGCTTCTCCAGACTACAAT)39. The PCR amplification program was as follows: predenaturation at 94 °C for 5 min; 40 cycles of denaturation at 94 °C for 30 s, annealing at 56 °C for 30 s, and extension at 72 °C for 45 s; extension at 72 °C for 10 min; and storage at − 20 °C. PCR products were sequenced bidirectionally at Guangzhou Aike Biotechnology Co., Ltd. (Guangzhou, China).

Data analysis

Sequence splicing was performed using CodonCode Aligner V3.7.1 (CodonCode Co., USA) to remove primer regions and low-quality sequences. The sequences obtained by sequencing and those downloaded from GenBank were annotated and cropped. Based on hidden Markov models (HMMs), the 5.8S and 26S regions were removed to obtain the complete ITS2 sequence63. The complete ITS2 sequence obtained through sequencing was used for the BLAST alignment analysis in GenBank, and MEGA 6.0 was used for alignment. The variation sites were recorded, and the species information of the samples was analyzed. Moreover, the ITS2 secondary structures of all species were predicted using the ITS2 database homology model64. Additionally, hCBCs (such as C-G → C-A or T-T → T-C) were calculated.