High species discrimination in Pedicularis (Orobanchaceae): Plastid genomes and traditional barcodes equally effective via parsimony-informative sites
You Wua,b,h, Rong Liuc,h, Wei-Jia Wanga,b, De-Zhu Lic,d, Kevin S. Burgesse,*, Wen-Bin Yub,f,g,**, Hong Wanga,***     
a. Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China;
b. Center for Integrative Conservation & Yunnan Key Laboratory for the Conservation of Tropical Rainforests and Asian Elephants, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla 666303, Yunnan, China;
c. Plant Germplasm and Genomics Centre, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China;
d. Center for Interdisciplinary Biodiversity Research & College of Forestry, Shandong Agricultural University, Tai'an 271018, Shandong, China;
e. Department of Biomedical Sciences, Mercer University School of Medicine, Columbus, GA 31901, USA;
f. Yunnan International Joint Laboratory for the Conservation and Utilization of Tropical Timber Tree Species, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla 666303, Yunnan, China;
g. Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Mengla 666303, Yunnan, China;
h. University of Chinese Academy of Sciences, Huairou District, Beijing 101408, China
Abstract: Complete plastid genomes have been proposed as potential "super-barcodes" for plant identification and delineation, particularly in cases where standard DNA barcodes may be insufficient. However, few studies have systematically addressed how taxonomic complexity, especially in rapidly radiating lineages with intricate evolutionary histories, might influence the efficacy of plastome-scale barcodes. Pedicularis is a hyperdiverse genus in the Himalaya-Hengduan Mountains, and previous studies have demonstrated high discriminatory power of the standard barcodes within this genus. Therefore, Pedicularis serves as a model for investigating the key plastome-sequence characteristics and biological phenomena that determine species-discrimination capacity. In this study, we evaluated 292 plastomes representing 96 Pedicularis species to compare the discriminatory power of complete plastid genomes with of standard DNA barcodes. Our results revealed that the traditional standard barcode combination (nrITS + matK + rbcL + trnH-psbA) achieved the highest discrimination rates (81.25%), closely followed by the plastid large single copy (LSC) region (80.21%), then by full plastome, the supermatrix of protein-coding genes, and hypervariable regions (79.17%). Notably, the matK and ycf1 gene alone could discriminate 78.13% of species. Key determinants of species discrimination by integrating alignment length (AL) and the proportion of parsimony-informative sites (PPIS), as well as conserved genes under relaxed selection exhibiting stronger discriminatory capacity. Unlike previous studies that demonstrated superior discrimination rates of plastome-scale barcodes, this study reveals a notable exception of minimal differences between traditional DNA and plastome-scale barcodes that appearing linked to Pedicularis' specific biological habits and potentially reflecting unique evolutionary patterns in the plastid genome.
Keywords: Pedicularis    Himalaya-Hengduan mountains    Plastid genome    Standard barcodes    Plastome-scale barcodes    Species identification    
1. Introduction

The mitochondrial cytochrome oxidase 1 (CO1) gene serves as a highly effective DNA barcode for species delimitation in insects (Foottit et al., 2008; Adeniran et al., 2019), fishes (Ward et al., 2005; Decru et al., 2016), birds (Kerr et al., 2009; Tizard et al., 2019), as well as other animal groups (Govender et al., 2022; Velo-Antón et al., 2022). In plants, however, the CO1 gene exhibits significantly lower species delimitation efficacy, partially attributable to the low substitution rates characteristic of plant mitochondrial genomes (Cho et al., 2004; Fazekas et al., 2009; Lewin et al., 2022). Consequently, multiple plastid DNA regions (trnL-F, rpoC1, rbcL, matK, trnH-psbA) and the nuclear ribosomal internal transcribed spacer (nrITS), and their combinatorial applications have been prosed as candidate DNA barcodes for plants (Chase et al., 2007; Kress and Erickson, 2007; Fazekas et al., 2008; Bruni et al., 2010; Wang et al., 2010; Burgess et al., 2011; Hollingsworth et al., 2011; Li et al., 2011; Zhang et al., 2012). While numerous studies demonstrate relatively high species delimitation success rates for traditional standard DNA barcodes with in some investigated plant genera and families (Wang et al., 2010; Li et al., 2011; Liu et al., 2011, 2017; Pang et al., 2011; Yu et al., 2011; Zuo et al., 2011; Chen et al., 2015; Xu et al., 2015; Yuan et al., 2015; Gogoi et al., 2020; Le et al., 2020), the delimitation rates markedly decline in plant lineages characterized by recent rapid radiation or suspected hybridization events (Roy et al., 2010; Yu et al., 2015, 2022; Fu et al., 2022; Zhang et al., 2023a). Notably, recent rapid radiation compromises species delimitation efficacy primarily through incomplete lineage sorting and insufficient genetic divergence due to short speciation intervals, compounded by morphological convergence under accelerated adaptive evolution (Robins et al., 2014; Xu et al., 2018; Yao et al., 2024). Moreover, hybridization between species is prevalent in plants, with at least 25% of plant species (Rieseberg, 1997; Mallet, 2005) – particularly among recently diverged lineages – demonstrating hybridization and introgression patterns with congeners. Therefore, it is challenging to discriminate all species using genetic differentiations.

The accelerated advancement of next-generation sequencing technologies has catalyzed the development of innovative strategies which are now available to overcome the limitations of traditional standard DNA barcodes (Nock et al., 2011; Li et al., 2015; Hollingsworth et al., 2016). The use of the complete plastid genome (plastome-scale barcodes) – also referred to as ultra-barcoding (Kane et al., 2012), organelle-scale barcodes (Yang et al., 2013), next-generation DNA barcoding (Fu et al., 2022) or super-barcodes (Li et al., 2015; Yu et al., 2022) – has been shown to improve species delimitation efficacy compared to traditional standard barcodes in plants. Implementation of the genome skimming protocols, complemented by optimized bioinformatic workflows and pipeline standardization, has enabled cost-effective deployment of these "super barcodes" approaches (Dierckxsens et al., 2017; Sarmashghi et al., 2019; Jin et al., 2020). Therefore, plastome-scale barcodes present significant potential to enhance species discrimination capacity for hyperdiverse and taxonomically complex genera. Nevertheless, accumulating empirical findings challenge the conventional paradigm of plastome-based phylogenomic efficacy in resolving species boundaries, with limitations attributable to pervasive biological phenomena including interspecific plastome introgression via recurrent hybridization, polyploid speciation, and ancestral incomplete lineage sorting maintained through rapid radiations (Fazekas et al., 2009; Hollingsworth et al., 2011, 2016; Yu et al., 2022; Lv et al., 2023; Su et al., 2023; Zhang et al., 2024a, b; Zhu et al., 2024).

Pedicularis L. (Lamiales: Orobanchaceae), one of the largest hemiparasitic angiosperm genera, consists of approximately 700 recognized species predominantly distributed across temperate and alpine ecosystems in the northern hemisphere (POWO, 2025; Wang and Yu, 2024). Notably, over 60% of Pedicularis species diversity is concentrated in the Himalaya-Hengduan Mountains region (Tsoong, 1963; Wang and Yu, 2024), which serves as the global species richness and diversification for this genus. Previous DNA barcoding study of 88 Pedicularis species from China documented that nrITS alone achieved 78.4% of discrimination rates across the genus, while combining nrITS with other plastid barcodes (matK, rbcL or trnH-psbA) increased discrimination rates to 81.8% (Yu et al., 2011). The highly diverse floral traits of Pedicularis can promote reproductive isolation among species (Eaton et al., 2012; Huang and Shi, 2013; Liang et al., 2018; Zhang et al., 2024a, b), and the uplift of the Himalaya-Hengduan Mountains creates geographical isolation, which accelerates species divergence and maintains high species richness and co-existence (Xing and Ree, 2017). However, pervasive biological phenomena such as incomplete lineage sorting, hybridization, and chloroplast genome capture events have been well-documented in the genus (e.g., Pedicularis Sect. Cyathophora in Yu et al., 2013; Pedicularis siphonantha complex in Liu et al., 2022), posing a challenge to species delineation by potentially generating discordance between morphological and molecular data. Therefore, the species discrimination rate using traditional standard DNA barcodes in Pedicularis reaches to 81.8%, which is deemed a high threshold in plants by excluding hybridization, introgression, incomplete lineage sorting, and other reticulated evolutionary processes (Mallet, 2005; Mallet et al., 2016). Given this, Pedicularis serves as an exemplary system for addressing the following three key questions: (1) Can plastome-scale barcodes enhance species discrimination capacity in Pedicularis compared to traditional standard DNA barcodes? (2) If not, what are key factors (sequence characteristics or evolutionary selection) influencing species discrimination capacity across the plastome sequences in this genus? (3) What biological factors limit the barcode discrimination rate when species cannot be discriminated?

2. Materials and methods 2.1. Taxon samplings

In this study, we analyzed 292 individuals representing 96 species of Pedicularis, with at least two samples per species. These samples cover 11 Groups and 48 Series under Tsoong's (1963) classification system (Table S1) and represent 12 out of 13 clades within the phylogenetic framework (Yu et al., 2015). DNA materials were from multiple sources, including silica gel dried fresh leave mainly collected from the Himalaya-Hengduan Mountains region in Southwest China, and supplemented by herbarium vouchers curated at the Herbarium of Kunming Institute of Botany (KUN), Chinese Academy of Sciences (CAS), and Institute of Botany (PE), CAS. Of the 292 samples used in this study, 213 were newly sequenced and assembled. The remaining 79 samples were obtained from concurrent studies, including 37 samples from Liu et al. (2024a) and 42 samples from W.-J. Wang et al. (unpublished data). For the outgroups, we downloaded the complete chloroplast genomes of five species from NCBI, i.e. Lindenbergia philippensis (Cham.) Benth. (HG530133.1), Phtheirospermum japonicum (Thunb.) Kanitz (MN075943.1), Rehmannia piasezkii Maxim. (KX636161.1), Triaenophora shennongjiaensis Xiao D. Li, Y.Y. Zan & J.Q. Li (MH071405.1), Wightia speciosissima (D. Don) Merr. (MK381318.1). Additionally, to validate our findings in the accuracy of species discrimination, we retrieved all available complete chloroplast genomes of Pedicularis species from NCBI, resulting in a dataset of 44 accessions representing 37 species (Table S5).

2.2. DNA extraction, sequencing, assembly and annotation

For the 213 new samples, the total genomic DNA of silica gel-dried leaves was extracted using a modified CTAB method (Doyle and Doyle, 1987), and the genomic DNA of herbarium materials were extracted using the DNAsecure Plant Kit (TIANGEN). The purified DNA was fragmented into about 350–500 bp for library construction using the NEBNext Ultra Ⅱ DNA Library Prep Kit for Illumina. Using the Illumina Hi-Seq 2500 platform, 150 bp of pair-end reads with more than 2 Gb raw data was generated for each sample.

Both plastome and nrITS sequences were assembled from the clean raw data using the GetOrganelle toolkit (Jin et al., 2020) with the following parameters "-w 0.65, –R 30 or 50 -k 21, 35, 55, 65, 85, 105". The complete plastome sequences were checked in Geneious v.8.1 (Kearse et al., 2012). Fragmented contigs were concatenated into complete plastome sequences in Geneious using the plastome sequence of Pedicularis tongolensis Franch. (MZ264887) as the reference. All plastomes were automatically annotated using PGA (Qu et al., 2019) and then manually adjusted in Geneious using the P. tongolensis plastome as the reference. Detailed information on plastome, samplings, vouchers, and classification for all samples is listed in Table S1.

2.3. DNA alignment and dataset preparation

Firstly, whole plastome sequences of all Pedicularis samples with an inverted repeat (IR) removed were aligned using MAFFT v.7.0 (Katoh and Standley, 2013). Then, we used DnaSP v.6.12 (Rozas et al., 2017) to assess highly variable regions and calculate nucleotide diversity (π) of the whole plastome matrix (WPM). The step and window sizes were set to 200 bp and 600 bp, respectively. Highly variable regions (HVR) were selected for evaluating barcodes. According to the plastome characteristics and previous studies of Pedicularis, seven datasets, including standard barcodes, super-barcodes and other potential combinations, were used to evaluate species discrimination: A: WPM (whole plastome matrix); B: LSC (the large single copy region of plastome); C: CDS (supermatrix of plastome coding DNA sequences); D: matK + rbcL + trnH-psbA (combination of three frequently-used plastid DNA barcode); E: HVR (matK + trnE-psbD + rpl20-clpP + ycf1, see details in Fig. S1); F: nrITS (internal transcribed spacer of nuclear ribosome DNA, consisting of ITS1-5.8S-ITS2); G: nrITS + matK + rbcL + trnH-psbA. In addition, we constructed a comprehensive plastome matrix H, which included the 292 samples used in this study and an additional 43 samples downloaded from NCBI. One of the 44 initially downloaded NCBI sequences was excluded following BLAST analysis, as it was determined to belong to the genus Dracocephalum L. (Lamiaceae). Each DNA region was aligned using MAFFT v.7.0 (Katoh and Standley, 2013). We then determined the aligned length (bp), number of variable sites (% divergence) and the number of parsimony-informative sites (% divergence) using MEGA X (Kumar et al., 2018) (Table 1).

Table 1 Characteristics of seven datasets, including three plastome-scale barcodes and four traditional DNA barcodes.
Alignment Dataset Aligned length (bp) No. of variable sites (% divergence) No. of parsimony-informative sites (% divergence)
WPM A 167, 851 53, 074 (31.62) 41, 762 (24.88)
LSC B 113, 881 36, 739 (32.26) 28, 936 (25.41)
CDS C 58, 280 15, 411 (26.44) 12, 265 (21.04)
matK + rbcL + trnH-psbA D 4128 1571 (38.06) 1259 (30.50)
HVR E 12, 809 5277 (41.98) 4191 (32.72)
nrITS F 656 385 (58.69) 342 (52.13)
nrITS + matK + rbcL + trnH-psbA G 4784 1956 (40.89) 1601 (33.47)
2.4. Species delineation using genetic distance and phylogenetic tree approaches

We applied two widely used methods to evaluate species delimitation rates: distance-based and tree-based. For the distance-based method, we conducted analyses using the "Best match/Best close match" model in TaxonDNA (Meier et al., 2006). The best match model finds the closest match for each sample based on genetic distance. Here, the identification will be considered successful if the two samples have the same species name. The Best close match model finds a threshold value based on all intraspecific distances. Samples without a match below the threshold value will be considered unidentified. For the tree-based method, we constructed phylogenetic comparisons using the maximum likelihood (ML) approach in RAxML v.8.2.12 (Stamatakis, 2014). We use the GTRCAT model with 1000 bootstrap iterations. A species is considered a successful identification when all the samples of morphology-based species identification are resolved as a monophyletic group with a bootstrap support value ≥ 50%. We further recorded all such monophyletic species that achieved ≥ 70% bootstrap support.

2.5. Correlation analyses between sequence characteristics and species discrimination

To evaluate correlations between sequence characteristics and species discrimination, we conducted phylogenies of 66 plastid CDS genes, two pseudogenes, and three intergenic regions using the neighbor-joining algorithm in RapidNJ v.2.3.2 (Simonsen et al., 2008) and counted the number of identified species (NIS). We also assessed the following sequence characteristics, including alignment length (AL), number of variable sites (NVS), percentage of variable sites (PVS), number of parsimony-informative sites (NPIS) and percentage of parsimony-informative sites (PPIS) for the 71 genomic loci. Using the "cor" function in the R package stats, we conducted correlation analyses and visualized the correlation coefficient using the R package corrplot. Then, taking NIS as the response variable and the remaining as predictor variables, we conducted an All-Subsets Regression (ASR) analysis to get the best model using the "regsubsets" function in R package leaps.

2.6. Selective pressure analysis

To explore whether genetic factors may influence species discrimination, we conducted selective pressure analysis for 66 CDS genes. We used MACSE v.2 to align each CDS gene with Phtheirospermum japonicum (Thunb.) Kanitz as the reference. Then, the "Simple Ka/Ks Calculator" function of TBtools (Chen et al., 2023) was used to calculate the ratio of non-synonymous substitutions (Ka) to synonymous substitutions (Ks). Ka/Ks > 1 indicates positive selection, Ka/Ks < 1 indicates purifying selection and Ka/Ks = 1 indicates neutral selection. If Ka > 0 and Ks = 0, the value of Ka/Ks will be represented by NA.

We also used the HyPhy program (Pond et al., 2005) with the hypothesis testing framework RELAX (Wertheim et al., 2015) on Datamonkey (https://www.datamonkey.org/) to test selection strength (relaxation or intensification). We set five species (Lindenbergia philippensis, Phtheirospermum japonicum, Rehmannia piasezkii, Triaenophora shennongjiaensis, Wightia speciosissima) as reference branches and all samples of Pedicularis as test branches. K < 1 indicates relaxation of selection, K = 1 indicates neutral selection, and K > 1 indicates intensification of selection. Moreover, we examined potential variations in discriminatory power among genes experiencing differential selection intensities (relaxation vs. intensification).

3. Results 3.1. Dataset characteristics

The plastid genomes of the 213 samples produced in this study allowed for the generation of a complete circular genome structure. Of the 292 plastomes of Pedicularis, the genome size ranged from 140, 174bp to 162, 218bp, including 74–97 protein-coding genes, eight rRNA genes and 36–39 tRNA genes (Table S1). The LSC region size ranged from 78, 698bp to 83, 843bp, where the IR region varied in length, and the SSC region was absent in a few samples (e.g., P. chenocephala W2021138 and P. lutescens LR202070803). The GC content of Pedicularis species ranged from 37.8% to 38.6% (Table S1).

Of the seven datasets, the WPM dataset (A) had the greatest number of variable sites and parsimony-informative (PI) sites (53, 074 bp (31.62%) variable sites and 41, 762 bp (24.88%) PI sites). Still, the nrITS dataset (F) had the highest percentage of variable sites and PI sites (385 bp (58.69%) variable sites and 342 bp (52.13%) PI sites) (Table 1). For the combination of plastid genes and regions, the HVR dataset (E) had a higher percentage of variable sites and PI sites (5277 bp (41.98%) variable sites and 4191 bp (32.72%) PI sites), followed by the CDS dataset (C) had variable sites and PI sites (15, 411 bp (26.44%) variable sites, and 12, 265 bp (21.04%) PI sites). The combination of three traditional plastid barcodes (dataset D: matK + rbcL + trnH-psbA) (1571 bp (38.06%) variable sits, and 1259 bp (30.50%) PI sites), which is more than the LSC dataset (B) in the percentage of PI sites (36, 739 bp (32.26%) variable sites, and 28, 936 bp (25.41%) PI sites) (Table 1).

In addition, as the largest hemiparasitic plant genus, 14 genes have undergone varying degrees of gene loss and pseudogenization in Pedicularis (Orobanchaceae), including eleven NDH (NA(D)H dehydrogenase) genes associated with photosynthesis, ccsA, accD and ycf15. Based on the degree of pseudogenization and loss, the state of each gene is categorized into four stages: (0) full length with function, (1) full length with a premature stop codon, (2) partial loss, and (3) complete loss (Fig. 2).

Fig. 1 Morphological diversity of Pedicularis species. (a) P. salviiflora; (b) P. rex; (c) P. integrifolia; (d) P. leptosiphon; (e) P. oederi; (f) P. lutescens; (g) P. vialii; (h) P. cranolopha.

Fig. 2 Maximum likelihood tree inferred using the supermatrix of plastid coding sequences (dataset C). The terminal branches of the phylogenetic tree without species name labels represent samples that could not be identified using this dataset; orange diamonds indicate unidentified samples based on all the datasets in this study. The 14-layer circular diagram surrounding the phylogenetic tree represents 14 genes: accD, ccsA, ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK and ycf15, from the inside out in turn. The four colors correspond to the state of each gene: (0) full length with function, (1) full length with a premature stop codon, (2) partial loss, and (3) complete loss. The outermost gray bar chart encircling the tree indicates the degree of pseudogenization, with taller bars reflecting a higher degree of pseudogenization.
3.2. Species delineation 3.2.1. Tree-based method

Ninety-six species could be attributed to 12 clades using seven datasets. Of them, six datasets (A, B, C, D, E, and G) (Figs. S2–S6 and S8) which is relatively consistent with the topology of Yu et al. (2015). The remaining dataset (F) produced different topologies with low support values among different clades (Fig. S7). Moreover, all datasets have high support values for species monophyly.

Based on the ML method, the plastome-scale barcodes (datasets A, B and C) showed comparable or slightly higher discrimination rates compared to different combinations of standard barcodes (datasets D and G) and HVR datasets. Among the seven datasets, the dataset G exhibited the highest discriminatory rate, distinguishing 78 species (81.25% of total species, hereafter), followed by the dataset B (77 species, or 80.21%), the datasets A, C and D (76 species or 79.17%). The datasets D discriminated 74 species (77.08%), while the dataset F discriminated 60 species (62.50%) (Table 2). Applying a ≥ 70% bootstrap threshold, discrimination failed for just three species in the traditional barcode datasets (D, F and G), a single species in the LSC and CDS datasets, and remained unchanged in the WPM and HVR datasets (Table 2 and Fig. 3).

Table 2 Summary of species discrimination rates (percentage) for seven datasets based on tree-based and distance-based methods.
Alignment Dataset Tree method Distance method
BS ≥ 50 BS ≥ 70 Best match (BM) Best close match (BCM)
WPM A 76/96 (79.17%) 76/96 (79.17%) 84.93% 84.24%
LSC B 77/96 (80.21%) 76/96 (79.18%) 86.30% 85.95%
CDS C 76/96 (79.17%) 75/96 (78.13%) 88.01% 88.01%
matK + rbcL + trnH-psbA D 74/96 (77.08%) 71/96 (73.96%) 81.16% 80.82%
HVR E 76/96 (79.17%) 76/96 (79.17%) 79.10% 79.10%
nrITS F 60/96 (62.50%) 57/96 (59.38%) 82.19% 81.50%
nrITS + matK + rbcL + trnH-psbA G 78/96 (81.25%) 75/96 (78.13%) 89.38% 88.69%

Fig. 3 The heatmap illustrates the species discrimination rate of 71 gene regions across seven datasets. Blue vs red rectangles denote successful and unsuccessful identifications, respectively. (A) On the far left is a phylogenetic tree of the 96 Pedicularis species identified in this study (numbers in parentheses represent the number of gene regions identified for that species) (B) The central portion shows gene regions arranged in order of decreasing discrimination rate (in parenthesis) from left to right; the grey trend line is also shown. (C) This section, enclosed by the dotted line on the far left, represents the species identification of the seven datasets.

For the dataset H, which includes 43 samples downloaded from NCBI, 23 samples represent 20 species, which have been sampled in our study and show consistency with our sampled samples. Of these 23 samples, three samples (marked in red in Fig. S9) should be misidentified or mislabeled (error rate: 13.04%), and four samples (marked in blue in Fig. S9) had incorrect phylogenetic placements in accordance with the phylogenetic study of Yu et al. (2015). Overall, eight of the 44 initially downloaded NCBI sequences (error rate: 18.18%) showed clear identification errors. This minimum observed error rate of 18% underscores the necessity for rigorous species identification within the genus Pedicularis and highlights the urgent need to establish a comprehensive, verified reference dataset.

3.2.2. Distance-based method

In the distance-based method, all datasets showed a high discrimination rate with relatively small differences. For the best match model, the result showed that dataset G had the highest discrimination rate (89.38%), followed by dataset C (88.01%), and dataset E had the lowest discrimination rate (79.10%) (Table 2). The best close match model exhibited a trend similar to the best match model, with the sole distinction being that the dataset G achieved the highest discrimination rate (88.69%). The coding sequences dataset C ranked second with a discrimination rate of 88.01%, followed by LSC (85.95%), WPM (84.24%), nrITS (81.50%), matK + rbcL + trnH-psbA (80.82%) and HVR (79.10%), respectively (Table 2).

3.3. Correlation and selective pressure analyses

We assessed species discrimination efficacy (NIS) by analyzing 71 genomic loci (including 66 CDS genes, two pseudogenes, and three intergenic regions). The matK and ycf1 genes exhibited the highest discriminatory power, distinguishing 75 species (78.13%), whereas the petN gene showed limited utility, discriminating only one species (1.04%) (Tables S2 and S3). Correlation analyses revealed strong positive associations between AL and both NVS (r = 0.88) and NPIS (r = 0.87), but weak correlations with proportional metrics (PVS: r = 0.14; PPIS: r = 0.13). Species discrimination capacity (NIS) showed moderate correlations with all five variables analyzed (r = 0.47–0.67). Notably, PVS and PPIS showed strong linear relationship (r = 0.98), while NVS and NPIS were perfect correlated (r = 1.00). To further disentangle these interdependencies, the ASR analysis selected the composite model integrating AL and PPIS as optimal, indicating their synergistic influence on species discrimination efficacy (NIS) (Fig. 4).

Fig. 4 Correlation analyses between sequence characteristics and species discrimination based on 71 gene regions. Sequence characteristics include aligned length (AL), the number of variable sites (NVS), the percentage of variable sites (PVS), the number of parsimony-informative sites (NPIS), the percentage of parsimony-informative sites (PPIS), and the number of identified species (NIS). (A) Correlation analysis of six variables. The strength of the correlation is indicated by the size of the pie chart in each square, the depth of the color, and the corresponding number; larger pie charts correspond to darker shades, and higher numbers signify stronger correlations (blue indicates a positive correlation; red indicates a negative correlation). (B) All Subsets Regression (ASR) analysis using NIS as the response variable and the remaining variables as predictors. The ordinate represents the Radj2 of the model with different combinations of variables, and the horizontal coordinate represents the variables. Different colored rectangles indicate various levels of explanation.

The selective pressure analyses revealed that all genes were subject to purifying selection in nearly all examined species, with the exception of atpF, infA, rps18, petL, and rpl23 genes in some species (Fig. 5 and Table S6). Variable amplitude of Ka/Ks values increased with a concomitant decline in species discrimination (Fig. 5). The RELAX analyses revealed that 36 genes underwent intensified selection pressure, while 30 genes experienced relaxed selection (Fig. 6). Genes under intensified selection showed lower efficacy of species discrimination with an average of 29.17 identified species (median = 27.50). In contrast, genes subject to relaxed selection exhibited higher efficacy of species discrimination, with an average of 40.43 identified species (median = 46.50) (Fig. 6). A statistically significant difference was observed between these two gene groups (Mann–Whitney U test, P = 0.041).

Fig. 5 The results of the selective pressure analysis. The two components are shown: (1) The section above the dotted line displays the outcome of the selection strength assessment by RELAX. The accompanying histogram illustrates the type of selective pressure, with blue indicating intensified selection and red indicating relaxed selection. (2) the section below the dotted line displays the outcome of the non-synonymous substitutions (Ka) to synonymous substitutions (Ks) ratio computing by TBtools. The box plots depict the range of Ka/Ks values. Genes are ranked based on their discrimination ability, with decreasing delineation from left to right.

Fig. 6 The boxplot demonstrates the difference in discriminatory power among genes experiencing differential selection intensities (relaxation vs. intensification). For the tested 66 CDS genes, 36 genes undergone intensifying selection (n = 36) and 30 genes undergone relaxed selection (n = 30). The ordinate is the number of species identified by the gene. The red line represents the median, and the green triangle represents the mean. The P-value is the result of the Mann–Whitney U test (significant difference).
4. Discussion 4.1. Parsimony-informative sites determining the power of species discrimination

The phylogenetic tree-based method for species discrimination in Pedicularis revealed that the supermatrix combining nrITS and three traditional standard barcodes (dataset G) achieved the highest discrimination rates at 81.25%, outperforming the WPM dataset (dataset A) and the LSC dataset (dataset B), by discriminating two and one additional species, respectively. Furthermore, the matK dataset and the concatenated matrix of three standard plastid barcodes (dataset D: matK + rbcL + trnH-psbA) successfully discriminated 75 (78.13%) and 74 species (77.08%), respectively. Compared to standard DNA barcodes, previous studies have shown that the plastome-scale barcodes generally exhibit higher discrimination rates. For example, the plastome barcodes have improved species discrimination from 33% to 55% in Rhododendron L. (Ericaceae) (Fu et al., 2022); 58%–68% in Cymbidium Sw. (Orchidaceae) (Zhang et al., 2023a); 0%–27% in Schima Reinw. (Theaceae) (Yu et al., 2022), and 56%–78% in Calligonum L. (Polygonaceae) (Song et al., 2020). However, our findings present a notable exception that there are minimal differences in species discrimination rates between traditional DNA barcodes and plastome-scale barcodes. This discrepancy appears to be linked to the specifically biological habits of Pedicularis and may reflect unique patterns of their plastid genome.

Pedicularis, the largest hemiparasitic and herbaceous genus within the family Orobanchaceae (Lamiales) (Li, 1951), has undergone extensive pseudogenization and gene loss (Zhang et al., 2020; Li et al., 2021). These genomic changes have led to instability in plastid genome structure, particularly evident in the expansion and contraction of the inverted repeat (IR) and the small single copy (SSC) regions (Zhang et al., 2020; Li et al., 2021). Therefore, the evolutionary stability of the large single copy (LSC) region (dataset B) has more PVS and PPIS, so that it can discriminate one more species (77 of 96 species) than the complete plastid genome (dataset A: 76 of 96 species) in Pedicularis. However, what is particularly noteworthy is that the supermatrix datasets combining standard DNA barcodes (i.e., datasets D and G) exhibited discriminatory power comparable to the plastome-scale barcodes (i.e., datasets A, B, and C), and the single matK gene or the ycf1 gene also showed high discriminatory power (75 of 96 species). These findings indicate that while simply increasing the number of variable sites does not necessarily enhance species discrimination in Pedicularis. Instead, the barcodes with more PIS and PPIS serve as critical determinants for successful species discrimination. Four standard barcodes were originally proposed as universal plant barcodes due to their sufficient variation rates and PCR amplification and sequencing feasibility, and high performance in species discrimination (Kress et al., 2005; CBOL Plant Working Group, 2009; Li et al., 2011). However, these genes exhibit dramatically elevated evolutionary rates in parasitic plants compared to their non-parasitic relatives (Nickrent and Starr, 1994; Bromham et al., 2013; Wicke et al., 2016), particularly in the plastid genes, leading to the accumulation of more variable sites. The parasitic lifestyle of Pedicularis likely explains the high discrimination capacity of standard DNA barcodes derived from plastid genes. Our findings further revealed that the integration of AL and PPIS for predicting discriminatory power, combined with the unexpected strength of conserved genes under relaxed selection, provides a transferable framework beyond Pedicularis. To apply this to other groups: (1) Pre-screen candidate loci to calculate AL and PPIS; prioritize loci within the target zone. (2) Identify key genes: Perform selection pressure analysis (dN/dS) on pre-screened loci, focusing on traditionally conserved genes exhibiting relaxed selection. This framework efficiently directs resources to the most promising markers.

Phylogenetic relationships within Pedicularis have been significantly clarified through plastome phylogenies in comparison with previous phylogenies using several DNA regions (Yang et al., 2003; Ree, 2005; Yang and Wang, 2007; Tkach et al., 2014; Robart et al., 2015; Yu et al., 2015). The plastome phylogeny revealed shorter internodes along main branches, contrasted by extended terminal branch lengths, which suggests an early rapid radiation event in this genus. Unlike other species-rich groups that undergo extensive interspecific hybridization or introgression (Fu et al., 2022; Zhang et al., 2023b; Liu et al., 2024b; Wang et al., 2024) and show a relatively chaotic state of phylogeny, the monophyly of most Pedicularis species have received relatively strong support. However, given the widespread occurrence of natural hybridization across plant lineages and the fact that approximately 25% of angiosperm species are involved in hybridization and introgression (Mallet, 2005), the species discrimination accuracy up to 80.21% achieved in this study (based on plastid dataset) likely approaches the theoretical maximum resolution attainable through genetic markers. The residual unresolved taxa (19.79%) may reflect both inherent limitations of plastid data and potential inconsistencies in current taxonomic delineations.

4.2. Key constraints on species discrimination using DNA barcodes: interspecific chloroplast genome capture and ancestral incomplete lineage sorting

Pedicularis is a highly diversified group (Fig. 1) that has recently undergone rapid diversification in the Himalaya-Hengduan Mountains region, driven by orogenic uplifts and monsoon dynamics (Xing and Ree, 2017; Ding et al., 2020). Although reproductive interference could be effectively reduced by divergent floral trait assembly among sympatrically distributed Pedicularis species (Eaton et al., 2012), this may lead to low level of interspecific gene flow and hybridization among co-flowering and sympatric species. Field experiments in pollination biology demonstrated that mechanical isolation mediated by floral trait divergence are imperfect in Pedicularis (Huang and Shi, 2013; Liang et al., 2018), while post-pollination isolation can play a critical role among coexisting Pedicularis species with pollen exchange (Liang et al., 2018). Nevertheless, ancient hybridization/introgression events have been well documented in Pedicularis sect. Cythophyalla (Yu et al., 2013; Li et al., 2021), and the P. siphonantha lineage (Yu et al., 2018; Liu et al., 2022). In the section Cyathophylla, cytoplasmic-nuclear genome discordance manifests through chloroplast capture events - Sichuan populations of P. cyathophylloides retain nrITS monophyly while bearing captured plastomes from an ancestral P. cyathophylla lineage (Yu et al., 2013). Such chloroplast genome capture events likely extend to some unidentified species in this research, such as the P. giraldiana clade, the P. rhynchodonta clade, and the P. verticillata clade (Yu et al., 2015). The P. rex complex exemplifies even greater phylogenetic complexity: subsp. rex displays plastid paraphyly due to incomplete lineage sorting (ILS) during population expansions, whereas var. rockii exhibits polyphyly through ancient hybridization (Yu et al., 2013). These dual mechanisms (incomplete lineage sorting vs. chloroplast genome capture) likely explain the mosaic plastid phylogeny of P. rex observed in our analyses.

Notably, our data revealed a distinct geographical signature in hybridization/introgression patterns, where population clustering reflects spatial proximity rather than taxonomic affinities. This phenomenon is particularly evident in the phylogenetic discordance observed across multiple species (Mallet et al., 2016). For example, both plastid phylogenies and nrITS phylogeny separate five samples of P. anas into two groups, three northern Sichuan samples and two southern Sichuan samples (Figs. S2–S8). Three northern Sichuan samples clustered with P. cheilanthifolia and P. giraldiana, and two southern Sichuan samples clustered with P. rupicola. Similar patterns are observed in four samples of P. roylei (two from Sichuan, and two from Xizang), and four samples of P. cheilanthifolia (two samples from Qinghai, and two samples from Xinjiang) (Figs. S2–S8), demonstrating consistent biogeographic clusters with potential hybridization/introgression events.

4.3. DNA barcoding challenges cryptic and recently diverged species

The taxonomic delimitation of recently diverged species is often complicated by their highly similar morphological traits, which can lead to significant challenges in accurate identification. Taxonomically, diagnostic traits of Pedicularis are mainly based on vegetative and reproductive characters, including root morphology, leaf architecture, phyllotaxy, inflorescence structure, floral tube dimensions, galea curvature, and calyx features. However, some key diagnostic traits (e.g., the degree of galea curvature) show minimal variation among closely related species, which might be difficult to quantify using specific terms and to distinguish from close relatives. Consequently, species discrimination based on plastid DNA markers or other neutral DNA regions tend to be failed or unresolved (Ross, 2014; Yu et al., 2022), especially considering the presence of cryptic species in Pedicularis (Yu et al., 2013, 2018; Wang et al., 2015; Liu et al., 2022). For example, a new species from the Jiaozi Mountain should be separated from P. miliana through integrative morphological, molecular, and biogeographic evidence in the P. siphonantha complex (Liu et al., 2022). In particular, herbarium specimens have lost key diagnostic traits for species identification, which could cause taxonomic confusion among this lineage (Yu et al., 2018; Liu et al., 2022). Furthermore, our analyses revealed that ten samples of P. davidii form a monophyletic clade based on the plastid datasets. In contrast, these samples split into two separated clades in the nrDNA phylogeny. Field observations provided additional insights, showing that two clades correspond to high-altitude populations and low-altitude populations, respectively. This pattern highlights a striking morphological differentiation between high- and low-altitude populations, suggesting a cryptic species in the P. davidii lineage by a recently phylogenetic divergence and ecological adaptation. These findings highlight the critical need for refined taxonomic frameworks in Pedicularis to reconcile plastid-based species discrimination with evolutionary realities.

Hybridization dynamics further complicate species boundaries, as evidenced by field documentation of introgressive populations exhibiting intermediate morphologies. For example, multiple hybrid populations exist between Pedicularis milliana and P. sigmoidea in the Yulong Snow Mountain region. High-altitude hybrids backcrosse with the ancestral lineage of P. milliana by inheriting their parental morphological traits, while low-altitude hybrids are in the process of speciation and display morphological differences from P. milliana and P. sigmoidea (Liu et al., 2022). Undoubtedly, the stochastic inheritance and independent evolution of the parental traits in hybrid populations complicate species identification. Although this phenomenon is common in highly diversified groups, which may also harbor undescribed species (Struck et al., 2018), the efficacy of plastid DNA barcodes in discriminating cryptic Pedicularis species that lack visually distinguishable morphological differences appears limited. The development of nuclear markers may provide a promising avenue for disentangling complex evolutionary history within the genus.

5. Conclusions

This study highlights the efficacy of standard DNA barcodes (e.g., nrITS, matK, rbcL, and trnH-psbA) in species discrimination within the hemiparasitic genus Pedicularis, demonstrating performance comparable to plastome-scale barcodes. Key factors influencing species resolution included alignment length (AL) and the percentage of parsimony-informative sites (PPIS), with conserved genes under relaxed selection exhibiting higher discriminatory power. The study also revealed challenges of species discrimination posed by rapid diversification, hybridization, and incomplete lineage sorting in Pedicularis. Phylogenetic discordance among chloroplast and nuclear markers (e.g., nrITS) underscored the impact of historical introgression and geographic isolation, particularly in recently diverged lineages. Morphological convergence and cryptic speciation further complicated taxonomic delineation, as evidenced by divergent molecular clades within morphologically uniform species like P. davidii. These findings suggest that plastid DNA barcodes alone may approach their theoretical resolution limits (~80%) in Pedicularis, with residual uncertainties likely reflecting taxonomic inconsistencies or limitations of plastid data. Future studies should integrate nuclear markers to resolve hybridization-driven complexities and validate cryptic species. Enhanced taxonomic scrutiny, combined with genomic approaches, will be critical for refining species boundaries in this rapidly radiating genus.

Acknowledgements

This study was supported by the National Natural Science Foundation of China (32371700, 32071670 and 31870196), the Strategic Priority Research Program of the Chinese Academy of Sciences (XDB31000000), the Science and Technology Basic Resources Investigation Program of China (2021FY100200), Yunnan Revitalization Talent Support Program "Young Talent" and "Innovation Team" Projects (202405AS350019), the 14th Five-Year Plan of Xishuangbanna Tropical Botanical Garden, Chinese Academy of Science (XTBG-1450101), the Key R & D program of Yunnan Province, China (202103AC100003) and the Key Basic Research program of Yunnan Province, China (202101BC070003). We are grateful to Jie Cai, Li-Na Dong, Lian-Ming Gao, Hua-Jie He, Jun He, Wei Jiang, Rong Li, Bin Liu, Cheng Liu, En-De Liu, Jie Liu, Min-Lu Liu, Lu Lu, Yang Luo, Hui Tang, Chun-Lei Xiang, Ji-Dong Ya, Qiu-Lin Yang, Xiu-Long Yang, Ting Zhang, and Shu-Dong Zhang for their help with fieldwork and providing plant samples, and to Jing Yang and Zhi-Rong Zhang for their help and suggestions in the lab work, and to the physical support from the Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, and the Information Center, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences.

Data availability statement

All DNA sequences generated in this article have been uploaded to China National Center for Bioinformation (CNCB) and the GenBank accession numbers can be found in the supplementary material of this article.

CRediT authorship contribution statement

You Wu: Investigation, Data curation, Formal analysis, Visualization, Writing original draft, Writing − review & editing. Rong Liu: Resources, Methodology, Investigation. Wei-Jia Wang: Resources, Investigation. De-Zhu Li: Resources, Methodology, Investigation. Kevin S Burgess: Writing − review & editing, Supervision. Wen-Bin Yu: Methodology, Investigation, Visualization, Writing − review & editing, Supervision, Funding acquisition. Hong Wang: Investigation, Writing − review & editing, Supervision, Funding acquisition.

Declaration of competing interest

The authors Dezhu Li and Hong Wang are Editors for Plant Diversity and were not involved in the editorial review or the decision to publish this article. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2025.09.005.

References
Adeniran, A.A., Fernandez-Santos, N.A., Rodriguez-Rojas, J.J., et al., 2019. Identification of phlebotomine sand flies (Diptera: Psychodidae) from leishmaniasis endemic areas in southeastern Mexico using DNA barcoding. Ecol. Evol., 9: 13543-13554. DOI:10.1002/ece3.5811
Bromham, L., Cowman, P.F., Lanfear, R., 2013. Parasitic plants have increased rates of molecular evolution across all three genomes. BMC Evol. Biol., 13: 126. DOI:10.1186/1471-2148-13-126
Bruni, I., De Mattia, F., Galimberti, A., et al., 2010. Identification of poisonous plants by DNA barcoding approach. Int. J. Leg. Med., 124: 595-603. DOI:10.1007/s00414-010-0447-3
Burgess, K.S., Fazekas, A.J., Kesanakurti, P.R., et al., 2011. Discriminating plant species in a local temperate flora using the rbcL+matK DNA barcode. Methods Ecol. Evol., 2: 333-340. DOI:10.1111/j.2041-210X.2011.00092.x
CBOL Plant Working Group, 2009. A DNA barcode for land plants. Proc. Natl. Acad. Sci. U.S.A., 106: 12794-12797. DOI:10.1073/pnas.0905845106
Chase, M.W., Cowan, R.S., Hollingsworth, P.M., et al., 2007. A proposal for a standardised protocol to barcode all land plants. Taxon, 56: 295-299. DOI:10.1002/tax.562004
Chen, C.J., Wu, Y., Li, J.W., et al., 2023. TBtools-Ⅱ: a "one for all, all for one" bioinformatics platform for biological big-data mining. Mol. Plant, 16: 1733-1742. DOI:10.1016/j.molp.2023.09.010
Chen, J., Zhao, J., Erickson, D.L., et al., 2015. Testing DNA barcodes in closely related species of Curcuma (Zingiberaceae) from Myanmar and China. Mol. Ecol. Resour., 15: 337-348. DOI:10.1111/1755-0998.12319
Cho, Y., Mower, J.P., Qiu, Y.L., et al., 2004. Mitochondrial substitution rates are extraordinarily elevated and variable in a genus of flowering plants. Proc. Natl. Acad. Sci. U.S.A., 101: 17741-17746. DOI:10.1073/pnas.0408302101
Decru, E., Moelants, T., De Gelas, K., et al., 2016. Taxonomic challenges in freshwater fishes: a mismatch between morphology and DNA barcoding in fish of the north-eastern part of the Congo basin. Mol. Ecol. Resour, 16: 342-352. DOI:10.1111/1755-0998.12445
Dierckxsens, N., Mardulyn, P., Smits, G., 2017. NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Res., 45: e18. DOI:10.1093/nar/gkw1060
Ding, W.-N., Ree, R.H., Spicer, R.A., et al., 2020. Ancient orogenic and monsoon-driven assembly of the world's richest temperate alpine flora. Science, 369: 578-581. DOI:10.1126/science.abb4484
Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull., 19: 11-15.
Eaton, D.A.R., Fenster, C.B., Hereford, J., et al., 2012. Floral diversity and community structure in Pedicularis (Orobanchaceae). Ecology, 93: S182-S194.
Fazekas, A.J., Burgess, K.S., Kesanakurti, P.R., et al., 2008. Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS One, 3: e2802. DOI:10.1371/journal.pone.0002802
Fazekas, A.J., Kesanakurti, P.R., Burgess, K.S., et al., 2009. Are plant species inherently harder to discriminate than animal species using DNA barcoding markers?. Mol. Ecol. Resour., 9: 130-139. DOI:10.1111/j.1755-0998.2009.02652.x
Foottit, R.G., Maw, H.E.L., Von Dohlen, C.D., et al., 2008. Species identification of aphids (Insecta: Hemiptera: Aphididae) through DNA barcodes. Mol. Ecol. Resour., 8: 1189-1201. DOI:10.1111/j.1755-0998.2008.02297.x
Fu, C.-N., Mo, Z.-Q., Yang, J.-B., et al., 2022. Testing genome skimming for species discrimination in the large and taxonomically difficult genus Rhododendron. Mol. Ecol. Resour., 22: 404-414. DOI:10.1111/1755-0998.13479
Gogoi, B., Wann, S.B., Saikia, S.P., 2020. DNA barcodes for delineating Clerodendrum species of North East India. Sci. Rep., 10: 13490. DOI:10.1038/s41598-020-70405-3
Govender, A., Singh, S., Groeneveld, J., et al., 2022. Experimental validation of taxon-specific mini-barcode primers for metabarcoding of zooplankton. Ecol. Appl., 32: e02469. DOI:10.1002/eap.2469
Hollingsworth, P.M., Graham, S.W., Little, D.P., 2011. Choosing and using a plant DNA barcode. PLoS One, 6: e19254. DOI:10.1371/journal.pone.0019254
Hollingsworth, P.M., Li, D.-Z., van der Bank, M., et al., 2016. Telling plant species apart with DNA: from barcodes to genomes. Philos. T. R. Soc. B-Biol. Sci., 371: 20150338. DOI:10.1098/rstb.2015.0338
Huang, S.-Q., Shi, X.-Q., 2013. Floral isolation in Pedicularis: how do congeners with shared pollinators minimize reproductive interference?. New Phytol., 199: 858-865. DOI:10.1111/nph.12327
Jin, J.-J., Yu, W.-B., Yang, J.-B., et al., 2020. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol., 21: 241. DOI:10.1186/s13059-020-02154-5
Kane, N., Sveinsson, S., Dempewolf, H., et al., 2012. Ultra-barcoding in cacao (Theobroma spp.; Malvaceae) using whole chloroplast genomes and nuclear ribosomal DNA. Am. J. Bot., 99: 320-329. DOI:10.3732/ajb.1100570
Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol., 30: 772-780. DOI:10.1093/molbev/mst010
Kearse, M., Moir, R., Wilson, A., et al., 2012. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 28: 1647-1649. DOI:10.1093/bioinformatics/bts199
Kerr, K.C.R., Birks, S.M., Kalyakin, M.V., et al., 2009. Filling the gap - COI barcode resolution in eastern Palearctic birds. Front. Zool., 6: 29. DOI:10.1186/1742-9994-6-29
Kress, W.J., Erickson, D.L., 2007. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One, 2: e508. DOI:10.1371/journal.pone.0000508
Kress, W.J., Wurdack, K.J., Zimmer, E.A., et al., 2005. Use of DNA barcodes to identify flowering plants. Proc. Natl. Acad. Sci. U.S.A., 102: 8369-8374. DOI:10.1073/pnas.0503123102
Kumar, S., Stecher, G., Li, M., et al., 2018. Mega X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol., 35: 1547-1549. DOI:10.1093/molbev/msy096
Le, D.-T., Zhang, Y.-Q., Xu, Y., et al., 2020. The utility of DNA barcodes to confirm the identification of palm collections in botanical gardens. PLoS One, 15: e0235569. DOI:10.1371/journal.pone.0235569
Lewin, H.A., Richards, S., Aiden, E.L., et al., 2022. The Earth BioGenome project 2020: starting the clock. Proc. Natl. Acad. Sci. U.S.A., 119: e2115635118. DOI:10.1073/pnas.2115635118
Li, D.-Z., Gao, L.-M., Li, H.-T., et al., 2011. Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proc. Natl. Acad. Sci. U.S.A., 108: 19641-19646. DOI:10.1073/pnas.1104551108
Li, H.-L., 1951. Evolution in the flowers of Pedicularis. Evolution, 5: 158-164. DOI:10.1111/j.1558-5646.1951.tb02771.x
Li, X., Yang, J.-B., Wang, H., et al., 2021. Plastid NDH pseudogenization and gene loss in a recently derived lineage from the largest hemiparasitic plant genus Pedicularis (Orobanchaceae). Plant Cell Physiol., 62: 971-984. DOI:10.1093/pcp/pcab074
Li, X., Yang, Y., Henry, R.J., et al., 2015. Plant DNA barcoding: from gene to genome. Biol. Rev., 90: 157-166. DOI:10.1111/brv.12104
Liang, H., Ren, Z.X., Tao, Z.B., et al., 2018. Impact of pre- and post-pollination barriers on pollen transfer and reproductive isolation among three sympatric Pedicularis (Orobanchaceae) species. Plant Biol., 20: 662-673. DOI:10.1111/plb.12833
Liu, J., Möller, M., Gao, L.-M., et al., 2011. DNA barcoding for the discrimination of Eurasian yews (Taxus L., Taxaceae) and the discovery of cryptic species. Mol. Ecol. Resour., 11: 89-100. DOI:10.1111/j.1755-0998.2010.02907.x
Liu, R., Wang, H., Yang, J.-B., et al., 2022. Cryptic species diversification of the Pedicularis siphonantha complex (Orobanchaceae) in the mountains of Southwest China since the Pliocene. Front. Plant Sci., 13: 811206. DOI:10.3389/fpls.2022.811206
Liu, R., Wang, W.-J., Wang, H., et al., 2024a. Plant species diversification in the Himalaya–Hengduan Mountains region: an example from an endemic lineage of Pedicularis (Orobanchaceae) in the role of floral specializations and rapid range expansions. Cladistics, 40: 636-652. DOI:10.1111/cla.12596
Liu, S.-Y., Yang, Y.-Y., Tian, Q., et al., 2024b. An integrative framework reveals widespread gene flow during the early radiation of oaks and relatives in Quercoideae (Fagaceae). J. Integr. Plant Biol., 67: 1119-1141.
Liu, Z.-F., Ci, X.-Q., Li, L., et al., 2017. DNA barcoding evaluation and implications for phylogenetic relationships in Lauraceae from China. PLoS One, 12: e0175788. DOI:10.1371/journal.pone.0175788
Lv, S.-Y., Ye, X.-Y., Li, Z.-H., et al., 2023. Testing complete plastomes and nuclear ribosomal DNA sequences for species identification in a taxonomically difficult bamboo genus Fargesia. Plant Divers., 45: 147-155. DOI:10.1016/j.pld.2022.04.002
Mallet, J., 2005. Hybridization as an invasion of the genome. Trends Ecol. Evol., 20: 229-237. DOI:10.1016/j.tree.2005.02.010
Mallet, J., Besansky, N., Hahn, M.W., 2016. How reticulated are species?. Bioessays, 38: 140-149. DOI:10.1002/bies.201500149
Meier, R., Shiyang, K., Vaidya, G., et al., 2006. DNA barcoding and taxonomy in diptera: a tale of high intraspecific variability and low identification success. Syst. Biol., 55: 715-728. DOI:10.1080/10635150600969864
Nickrent, D.L., Starr, E.M., 1994. High rates of nucleotide substitution in nuclear small-subunit (18S) rDNA from holoparasitic flowering plants. J. Mol. Evol., 39: 62-70.
Nock, C.J., Waters, D.L.E., Edwards, M.A., et al., 2011. Chloroplast genome sequences from total DNA for plant identification. Plant Biotechnol. J., 9: 328-333. DOI:10.1111/j.1467-7652.2010.00558.x
Pang, X., Song, J., Zhu, Y., et al., 2011. Applying plant DNA barcodes for Rosaceae species identification. Cladistics, 27: 165-170. DOI:10.1111/j.1096-0031.2010.00328.x
Pond, S.L.K., Frost, S.D.W., Muse, S.V., 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics, 21: 676-679. DOI:10.1093/bioinformatics/bti079
POWO, 2025. Plants of the World Online. Facilitated by the Royal Botanic Gardens, Kew. Published on the Internet; https://powo.science.kew.org/.Retrieved16July2025.
Qu, X.J., Moore, M.J., Li, D.Z., et al., 2019. PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes. Plant Methods, 15: 50. DOI:10.1186/s13007-019-0435-7
Ree, R.H., 2005. Phylogeny and the evolution of floral diversity in Pedicularis (Orobanchaceae). Int. J. Plant Sci., 166: 595-613. DOI:10.1086/430191
Rieseberg, L.H., 1997. Hybrid origins of plant species. Annu. Rev. Ecol. Evol. Syst., 28: 359-389. DOI:10.1146/annurev.ecolsys.28.1.359
Robart, B.W., Gladys, C., Frank, T., et al., 2015. Phylogeny and biogeography of north American and Asian Pedicularis (Orobanchaceae). Syst. Bot., 40: 229-258. DOI:10.1600/036364415X686549
Robins, J.H., Tintinger, V., Aplin, K.P., et al., 2014. Phylogenetic species identification in Rattus highlights rapid radiation and morphological similarity of New Guinean species. PLoS One, 9: e98002. DOI:10.1371/journal.pone.0098002
Ross, H.A., 2014. The incidence of species-level paraphyly in animals: a re-assessment. Mol. Phylogenet. Evol., 76: 10-17. DOI:10.1016/j.ympev.2014.02.021
Roy, S., Tyagi, A., Shukla, V., et al., 2010. Universal plant dna barcode loci may not work in complex groups: a case study with indian berberis species. PLoS One, 5: e13674. DOI:10.1371/journal.pone.0013674
Rozas, J., Ferrer-Mata, A., Sánchez-DelBarrio, J.C., et al., 2017. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol., 34: 3299-3302. DOI:10.1093/molbev/msx248
Sarmashghi, S., Bohmann, K., P Gilbert, M.T., et al., 2019. Skmer: assembly-free and alignment-free sample identification using genome skims. Genome Biol., 20: 34. DOI:10.1186/s13059-019-1632-4
Simonsen, M., Mailund, T., Pedersen, C.N.S., 2008. Rapid Neighbour-Joining. In: Proceedings of the 8th Workshop in Algorithms in Bioinformatics (WABI), pp. 113-122.
Song, F., Li, T., Burgess, K.S., et al., 2020. Complete plastome sequencing resolves taxonomic relationships among species of Calligonum L. (Polygonaceae) in China. BMC Plant Biol., 20: 261. DOI:10.1186/s12870-020-02466-5
Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30: 1312-1313. DOI:10.1093/bioinformatics/btu033
Su, N., Hodel, R.G.J., Wang, X., et al., 2023. Molecular phylogeny and inflorescence evolution of Prunus (Rosaceae) based on RAD-seq and genome skimming analyses. Plant Divers., 45: 397-408. DOI:10.1016/j.pld.2023.03.013
Tizard, J., Patel, S., Waugh, J., et al., 2019. DNA barcoding a unique avifauna: an important tool for evolution, systematics and conservation. BMC Evol. Biol., 19: 52. DOI:10.1186/s12862-019-1346-y
Tkach, N., Ree, R.H., Kuss, P., et al., 2014. High mountain origin, phylogenetics, evolution, and niche conservatism of arctic lineages in the hemiparasitic genus Pedicularis (Orobanchaceae). Mol. Phylogenet. Evol., 76: 75-92. DOI:10.1016/j.ympev.2014.03.004
Tsoong, P. -C., 1963. Scrophulariaceae (Pars Ⅱ). In: Chien, S-S, Chun, W-Y (Eds. ), Flora Reipublicae Popularis Sinacae, Vol. 68. Science Press, Beijing, pp. 1-378.
Velo-Antón, G., Henrique, M., Liz, A.V., et al., 2022. DNA barcode reference library for the West Sahara-Sahel reptiles. Sci. Data, 9: 459. DOI:10.1038/s41597-022-01582-1
Wang, H.-J., Li, W.-T., Liu, Y.-N., et al., 2015. Range-wide multilocus phylogenetic analyses of Pedicularis sect. Cyathophora (Orobanchaceae): implications for species delimitation and speciation. Taxon, 64: 959-974. DOI:10.12705/645.6
Wang, W., Wu, Y., Yan, Y., et al., 2010. DNA barcoding of the Lemnaceae, a family of aquatic monocots. BMC Plant Biol., 10: 205. DOI:10.1186/1471-2229-10-205
Wang, Y., Wu, X., Chen, Y., et al., 2024. Phylogenomic analyses revealed widely occurring hybridization events across Elsholtzieae (Lamiaceae). Mol. Phylogenet. Evol., 198: 108112. DOI:10.1016/j.ympev.2024.108112
Wang, H., Yu, W.-B., 2025. Illustrated Guide of Pedicularis in China. Beijing: Science Press.
Ward, R.D., Zemlak, T.S., Innes, B.H., et al., 2005. DNA barcoding Australia's fish species. Philos. Trans. R. Soc. B-Biol. Sci., 360: 1847-1857. DOI:10.1098/rstb.2005.1716
Wertheim, J.O., Murrell, B., Smith, M.D., et al., 2015. RELAX: detecting relaxed selection in a phylogenetic framework. Mol. Biol. Evol., 32: 820-832. DOI:10.1093/molbev/msu400
Wicke, S., Mueller, K.F., dePamphilis, C.W., et al., 2016. Mechanistic model of evolutionary rate variation en route to a nonphotosynthetic lifestyle in plants. Proc. Natl. Acad. Sci. U.S.A., 113: 9045-9050. DOI:10.1073/pnas.1607576113
Xing, Y., Ree, R.H., 2017. Uplift-driven diversification in the Hengduan Mountains, a temperate biodiversity hotspot. Proc. Natl. Acad. Sci. U.S.A., 114: E3444-E3451.
Xu, S.-Z., Li, Z.-Y., Jin, X.-H., 2018. DNA barcoding of invasive plants in China: a resource for identifying invasive plants. Mol. Ecol. Resour., 18: 128-136. DOI:10.1111/1755-0998.12715
Xu, S., Li, D., Li, J., et al., 2015. Evaluation of the DNA barcodes in Dendrobium (Orchidaceae) from Mainland Asia. PLoS One, 10: e0115168. DOI:10.1371/journal.pone.0115168
Yang, F.-S., Wang, X.-Q., 2007. Extensive length variation in the cpDNA trnT-trnF region of hemiparasitic Pedicularis and its phylogenetic implications. Plant Syst. Evol., 264: 251-264. DOI:10.1007/s00606-006-0510-1
Yang, F.S., Wang, X.Q., Hong, D.Y., 2003. Unexpected high divergence in nrDNA ITS and extensive parallelism in floral morphology of Pedicularis (Orobanchaceae). Plant Syst. Evol., 240: 91-105. DOI:10.1007/s00606-003-0005-2
Yang, J.-B., Tang, M., Li, H.-T., et al., 2013. Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analyses. BMC Evol. Biol., 13: 84. DOI:10.1186/1471-2148-13-84
Yao, L.-F., Shao, Z.-K., Li, N., et al., 2024. Genome-wide species delimitation and quantification of the extent of introgression in eriophyoid mite Epitrimerus sabinae complex (Acariformes: Eriophyoidea). Mol. Phylogenet. Evol., 201: 108220. DOI:10.1016/j.ympev.2024.108220
Yu, W.-B., Huang, P.-H., Li, D.-Z., et al., 2013. Incongruence between nuclear and chloroplast DNA phylogenies in Pedicularis section cyathophora (Orobanchaceae). PLoS One, 8: e74828. DOI:10.1371/journal.pone.0074828
Yu, W.-B., Huang, P.-H., Ree, R.H., et al., 2011. DNA barcoding of Pedicularis L. (Orobanchaceae): evaluating four universal barcode loci in a large and hemiparasitic genus. J. Syst. Evol., 49: 425-437. DOI:10.1111/j.1759-6831.2011.00154.x
Yu, W.-B., Liu, M.-L., Wang, H., et al., 2015. Towards a comprehensive phylogeny of the large temperate genus Pedicularis (Orobanchaceae), with an emphasis on species from the Himalaya-Hengduan Mountains. BMC Plant Biol., 15: 176. DOI:10.1186/s12870-015-0547-9
Yu, W.-B., Wang, H., Liu, M.-L., et al., 2018. Phylogenetic approaches resolve taxonomical confusion in Pedicularis (Orobanchaceae): reinstatement of Pedicularis delavayi and discovering a new species Pedicularis milliana. PLoS One, 13: e0200372. DOI:10.1371/journal.pone.0200372
Yu, X.-Q., Jiang, Y.-Z., Folk, R.A., et al., 2022. Species discrimination in Schima (Theaceae): Next-generation super-barcodes meet evolutionary complexity. Mol. Ecol. Resour., 22: 3161-3175. DOI:10.1111/1755-0998.13683
Yuan, Q.-J., Zhang, B., Jiang, D., et al., 2015. Identification of species and materia medica within Angelica L. (Umbelliferae) based on phylogeny inferred from DNA barcodes. Mol. Ecol. Resour., 15: 358-371. DOI:10.1111/1755-0998.12296
Zhang, C.-Y., Wang, F.-Y., Yan, H.-F., et al., 2012. Testing DNA barcoding in closely related groups of Lysimachia L. (Myrsinaceae). Mol. Ecol. Resour., 12: 98-108. DOI:10.1111/j.1755-0998.2011.03076.x
Zhang, L., Huang, Y.-W., Huang, J.-L., et al., 2023a. DNA barcoding of Cymbidium by genome skimming: call for next-generation nuclear barcodes. Mol. Ecol. Resour., 23: 424-439. DOI:10.1111/1755-0998.13719
Zhang, Q., Folk, R.A., Mo, Z.-Q., et al., 2023b. Phylotranscriptomic analyses reveal deep gene tree discordance in Camellia (Theaceae). Mol. Phylogenet. Evol., 188: 107912. DOI:10.1016/j.ympev.2023.107912
Zhang, Q.-Y., Chen, Z., Sun, H., et al., 2024a. Intraspecific floral colour variation in three Pedicularis species. Plant Divers., 46: 274-279. DOI:10.1016/j.pld.2023.03.011
Zhang, R., Xu, B., Li, J., et al., 2020. Transit from autotrophism to heterotrophism: sequence variation and evolution of chloroplast genomes in Orobanchaceae species. Front. Genet., 11: 542017. DOI:10.3389/fgene.2020.542017
Zhang, Z., Liu, G., Li, M., 2024b. Phylotranscriptomic discordance is best explained by incomplete lineage sorting within Allium subgenus Cyathophora and thus hemiplasy accounts for interspecific trait transition. Plant Divers., 46: 28-38. DOI:10.1016/j.pld.2023.07.004
Zhu, H., Lei, W., Lai, Q., et al., 2024. Comparative analysis shows high level of lineage sorting in genomic regions with low recombination in the extended Picea likiangensis species complex. Plant Divers., 46: 547-550. DOI:10.1016/j.pld.2024.04.004
Zuo, Y.J., Chen, Z.J., Kondo, K., et al., 2011. DNA barcoding of Panax species. Planta Med., 77: 182-187. DOI:10.1055/s-0030-1250166