High-quality genome of <i>Oreocharis mileensis</i> (Gesneriaceae) provides insights into the adaptation and conservation of highly threatened species in karst region

Citation

Temur Asatulloev, Lei Cai, Ziyoviddin Yusupov, Kai-Hua Jia, Ren-Gang Zhang, Komiljon Sh. Tojibaev, Wei-Bang Sun. High-quality genome of Oreocharis mileensis (Gesneriaceae) provides insights into the adaptation and conservation of highly threatened species in karst region[J]. Plant Diversity, 2026, 48(2): 262-277.

High-quality genome of Oreocharis mileensis (Gesneriaceae) provides insights into the adaptation and conservation of highly threatened species in karst region

Temur Asatulloev^a,b,c^,1, Lei Cai^a^,*,1, Ziyoviddin Yusupov^c, Kai-Hua Jia^d, Ren-Gang Zhang^a,b, Komiljon Sh. Tojibaev^c^,**, Wei-Bang Sun^a^,***

a. Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations/State Key Laboratory of Plant Diversity and Specialty Crops, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, Yunnan, China;
b. University of Chinese Academy of Sciences, Beijing 100049, China;
c. Institute of Botany, Academy of Sciences, Tashkent 100125, Uzbekistan;
d. Shandong International Joint Laboratory of Agricultural Germplasm Resources Innovation, Institute of Crop Germplasm Resources, National Saline-Alkali Tolerant Crop Germplasm Resources Nursery (Dongying), Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China

Received 5 June 2025; Received in revised form 3 November 2025; Accepted 7 November 2025; Available online 14 November 2025

Corresponding author: E-mail address: cailei@mail.kib.ac.cn (L. Cai);
E-mail address: ktojibaev@mail.ru (K.Sh. Tojibaev);
E-mail address: wbsun@mail.kib.ac.cn (W.-B. Sun).
Peer review under responsibility of Editorial Office of Plant Diversity.

^* Corresponding author.
^** Corresponding author.
^*** Corresponding author.
¹ These authors contributed equally to this work.

Abstract: Despite the ecological importance of karst ecosystems, genomic studies on the conservation and adaptation of karst plant species, particularly threatened wild species, remain scarce. Here, we investigate the genetic architecture, demographic history, and adaptive potential of Oreocharis mileensis, a threatened karst-endemic resurrection plant. We generated a high-quality, phased genome assembly of 3.99 Gb covering two haplotypes with a contig N50 of 124 Mb, revealing three lineage-specific whole-genome duplication events following the ancestral γ event. Population resequencing of 107 individuals across 10 localities uncovered strong population structure, low gene flow, and high genetic differentiation, indicating long-term isolation. Core populations exhibited elevated inbreeding, whereas peripheral populations showed reduced runs of homozygosity and fewer deleterious mutations, suggesting historical demographic events and possible purging effects. pRDA analyses on putatively adaptive loci revealed that genetic variation is primarily aligned with pre-existing population structure. Functional annotation of putatively adaptive SNPs identified genes associated with drought tolerance and rapid recovery after rewatering. Gradient Forest models revealed substantial genomic offset in almost all populations, highlighting their increased vulnerability under projected climate scenarios. Based on these findings, we propose conservation strategies that include delineating eight genetically informed management units and facilitating assisted gene flow among compatible populations. For highly isolated populations, ex situ conservation, habitat restoration, and cultivation for horticultural use are recommended to mitigate genetic erosion and enhance adaptive resilience. This study provides critical insights into how historical, demographic, and environmental factors shape genetic diversity and informs conservation efforts for plant species in fragile karst ecosystems.

Keywords: Oreocharis mileensis Isolation by environment Deleterious mutation Adaptive potential Karst region

1. Introduction

Plant species with narrow habitat ranges face heightened extinction risks from climate change, environmental shifts, and human activities (Bijlsma and Loeschcke, 2012; Ralls et al., 2018). Threatened plant species with small, fragmented populations often experience genetic drift, which diminishes their genetic variability and heightens differentiation as a result of bottleneck effects (Kéry et al., 2000; Haag et al., 2010). Such genetic divergence may trigger outbreeding depression, exacerbating threats to population survival (Ellstrand and Elam, 1993; Spielman et al., 2004; Frankham et al., 2011; Brys and Jacquemyn, 2016). Collectively, these factors diminish a species' ability to adapt to habitat degradation or climatic shifts (Rodger et al., 2021). Consequently, assessing genetic diversity, population structure, demographic history, fitness, and deleterious mutations is essential for conserving rare plant species (Aguilar et al., 2008; Hill, 2010; Hoban et al., 2013; Schäfer et al., 2020; Kardos et al., 2021; Kyriazis et al., 2021; Ducrettet et al., 2023).

One group of plants at critical risk of extinction are species endemic to karst ecosystems in China (Liu et al., 2024). Research that has assembled and annotated whole genomes have provided insights into the adaptive mechanisms, population genetic structure, and potential conservation strategies for karst-adapted and threatened plant species. For instance, a genomic study of Begonia masoniana (Chen et al., 2024) revealed that isolated populations exhibit limited gene flow, high inbreeding, and an accumulation of deleterious mutations. Similarly, in Urophysa species, bottlenecks, restricted gene flow, niche specialization, and inbreeding have been shown to threaten species, while physiological and transcriptomic analyses have identified positively selected genes associated with ion regulation and cell wall maintenance, which are key adaptations for survival in limestone habitats (Xie et al., 2026). Furthermore, research has also shown that Camellia rubituberculata adapted to karst environments through gene family expansion and positive selection on genes related to photosynthesis, phytohormone signaling, and calcium/ion transport, providing genomic evidence for adaptive evolution and population divergence between wild and cultivated groups (Yan et al., 2026). Nevertheless, research on the diversity, conservation, and adaptive evolution of karst-adapted plants remains limited (Feng et al., 2020; Cao et al., 2023).

Previous studies have examined the adaptive ability of karst adapted species in the genus Oreocharis. Analysis of the genomes of two species (O. esquirolii and O. maximowiczii) distributed in Guizhou Province, southeastern China, indicated that O. esquirolii is endangered due to diminished genetic diversity, pronounced inbreeding, and reduced ability to eliminate deleterious alleles (Peng et al., 2025). However, our understanding of the adaptive ability of other karst-adapted Oreocharis species remains limited. For example, we know little about the conservation status and capacity for adaptation in species such as Oreocharis mileensis. Originally described as Paraisometrum mileense (Weitzman et al., 1997), this species was reclassified under Oreocharis s.l. (Möller et al., 2011). O. mileensis is endemic to karst forests in southwest China (1100–2150 m), restricted to Yunnan, Guizhou, and Guangxi (Xu et al., 2009; Gao and Xu, 2011; Chen et al., 2014). This resurrection plant possesses high potential ornamental value and is extremely drought resistant (Chen et al., 2014). It is listed as Endangered (EN) in China's biodiversity red lists (Qin et al., 2017) and prioritized in national PSESP rescue plans (Sun et al., 2013, 2019a). The fragmented distribution and small populations of O. mileensis make the species highly susceptible to habitat destruction, climate change, and anthropogenic pressures (Chen et al., 2014; Liu et al., 2024). For instance, surveys indicate that there remain only 10 remnant populations/locations, one of which has been lost in Shui Jing Wan due to the construction of the highway. Low genetic diversity and environmental sensitivity further threaten its survival (Chen et al., 2014; Li et al., 2014).

While traditional conservation genetics relies on neutral markers (Allendorf, 2017), advancements like whole-genome resequencing (WGR) now enable genome-wide conservation analyses, improving the accuracy of genetic assessments (Abzhanov et al., 2008; Weber et al., 2013; Poelstra et al., 2014; Jing et al., 2023). These high-throughput methods facilitate the reconstruction of species' evolutionary histories (Leitwein et al., 2020), identification of putatively adaptive loci, and estimation of critical parameters such as effective population size (Ne), inbreeding, and mutation load (Hohenlohe et al., 2010; Nielsen et al., 2011; Lamichhaney et al., 2017). Such tools are invaluable for conserving plant species with extremely small populations (PSESP), a category of highly threatened flora demanding targeted interventions (Ma et al., 2013; Sun et al., 2019b, 2024; Yang et al., 2020). The PSESP approach combines ecological, propagation, and genetic data to design conservation strategies, as demonstrated in Acer, Dipteronia, Magnolia, and Rhododendron (Yang et al., 2019, 2024; Ma et al., 2021, 2022; Cai et al., 2024; Feng et al., 2024; Li et al., 2024).

This study employs conservation genomics, based on a de novo genome assembly and annotation, to evaluate the genetic diversity of Oreocharis mileensis, identify drivers of population decline, and assess the adaptive capacity of the species under future climates.

2. Materials and methods 2.1. Collection of plant material

Young leaves for whole-genome sequencing were obtained from a cultivated individual at Kunming Botanical Garden (origin: Shilin County, Yunnan). For transcriptome analysis, we collected roots, stems, leaves, and inflorescences from wild plants of the same Shilin population. All samples were flash-frozen in liquid nitrogen and maintained at −78.5 ℃. For population resequencing, fresh leaves were collected from 10 natural populations (5–20 m sampling intervals, adjusted for population size) and preserved in silica gel. Finally, 107 individuals of Oreocharis mileensis and two samples of O. hekouensis (as an outgroup) were selected for genetic diversity and phylogenetic analyses through whole genome resequencing (Table S1).

2.2. Genome sequencing

The ploidy level of Oreocharis mileensis is 2n = 34 (Tan et al., 2011). To estimate nuclear DNA content, we followed the flow cytometry protocol of Qiu et al. (2020). Leaf tissue was chopped into small fragments (1–2 cm) in 500 μl of CyStain UV Precise P extraction buffer (Sysmex) and filtered through a 50-μl mesh. A staining solution containing propidium iodide (4.29 μl), CyStain staining buffer (0.71 ml), and RNAse A (2.14 μl) was added to stain the nuclei. Samples were kept on ice before analysis on a BD LSRII H4760 flow cytometer (BD Biosciences) using the PI laser detector at 480 V, with a minimum of 2000 nuclei counted. Data was analyzed using FACSDiva v.8.0.1. Finally, genome size was estimated as 1.74 Gb.

Genomic DNA and transcriptome sequencing were performed on various platforms to ensure accuracy and efficiency. For DNA extraction, library construction, and PacBio sequencing, we first extracted genomic DNA from leaf samples using the CTAB method (Doyle and Doyle, 1987) and DNeasy Plant Mini Kit. DNA purity, quantity, and integrity were assessed using a NanoDrop One spectrophotometer, Qubit 3.0 Fluorometer, and Femto Pulse. DNA (8 μg) was fragmented using Megaruptor 3, purified with AMPure PB beads, and used to construct SMRT bell libraries with the Pacific Biosciences SMRT bell Express Template Prep Kit 2.0. Libraries were size-selected on the BluePippin^TM system (15 Kb insert size), primer-annealed, and bound to polymerase using the DNA/Polymerase Binding Kit. Sequencing was performed on the PacBio Sequel Ⅱ platform, generating HiFi reads.

For BGI-Next Generation DNA Library construction and sequencing, we used genomic DNA (1 μg) fragmented with a Covaris ultrasonicator, filtered (200–400 bp), PCR-amplified, purified, and circularized. Libraries were sequenced on the DNBSEQ platform, with raw reads filtered using fastp (v.0.21.0) to discard low-quality reads (Chen et al., 2018). For Hi-C DNA, genomic material was fragmented, blunted, phosphorylated at the 5′ ends, and dA-tailed at the 3′ ends. Adapters were ligated, biotin-labeled regions captured, and Hi-C DNA amplified via PCR. After quality control, quantification, and circularization, libraries were sequenced on the DNBSEQ-T7 platform using paired-end sequencing. Iso-Seq transcriptome analysis was based on full-length mRNA. Total RNA (500 ng) was diluted to 9 μL, reverse transcription primer added (Table S2), and samples underwent reverse transcription, amplification, purification with AMPure beads, and adapter ligation. Libraries were sequenced on the PromethION platform (Oxford Nanopore Technologies).

2.3. Genome assembly

We generated ~91 Gb HiFi data (~46×), ~288 Gb (~144×) DNBSEQ reads, ~222 Gb Hi-C reads, and ~15 Gb Iso-Seq reads (Tables S3–S6). Contigs were generated from HiFi and Hi-C reads using Hifiasm (v.0.19.8-r602; Cheng et al., 2021), with the haplotype assembly chosen for downstream analysis. Hi-C reads were aligned with Juicer, and chromosomes were assembled using 3D-DNA (Dudchenko et al., 2017). Manual corrections in Juicebox refined chromosome boundaries (Durand et al., 2017). Final scaffolds were generated with 100 bp gaps filled by quarTeT (Lin et al., 2023). Telomeres were extended by realigning HiFi reads and assembling contigs with Hifiasm. Genome polishing used NextPolish2 with HiFi and second-generation reads (Hu et al., 2023). Redundans (Pryszcz and Toni, 2016) was employed to eliminate redundant sequences, while low-coverage regions and rDNA were detected either manually or using Barrnap.

2.4. Assessment of genome assembly

Assembly completeness was evaluated using BUSCO (Simão et al., 2015) and LAI analyses. Sequencing reads were aligned with BWA and Minimap2 (Li, 2013, 2018), followed by removal of non-primary alignments. Genome coverage and mapping efficiency were then calculated from filtered alignments. Coverage depth was checked for a Poisson distribution expected for redundancy-free single- and multi-copy core genes (Fig. S1). HiFi and second-generation sequencing depth across GC content was evaluated for bias (Fig. S2). Minimap2 confirmed chromosome order consistency in collinearity analysis (Fig. S3). Telomeres, rDNA, tandem repeats, and other sequences were mapped to chromosomes (Fig. S4).

2.5. Genome annotation

EDTA identified transposable elements de novo (parameters: –sensitive 1 –anno 1), generating a TE library (Ou et al., 2019). RepeatMasker identified repetitive regions in the genome. A non-redundant set of 384,287 protein sequences from species like Antirrhinum majus, Boea hygrometrica, Coffea canephora, and Primulina huaijiensis was reference for annotation. Iso-Seq reads were aligned to the genome using Minimap2 (Li, 2018) and assembled with StringTie (Pertea et al., 2015), providing transcript evidence for annotation and model training. PASA pipeline was used to annotate gene structure (Haas et al., 2003), and full-length genes were identified by aligning transcripts with reference proteins. The gene prediction tools AUGUSTUS and SNAP underwent five iterative rounds of training and optimization utilizing the complete set of full-length genes (Korf, 2004; Stanke et al., 2008).

The MAKER2 pipeline performed ab initio prediction, integrating transcript and homologous protein evidence (Cantarel et al., 2008). Due to MAKER's lower accuracy (Cook et al., 2018), MAKER and PASA annotations were merged using EvidenceModeler (EVM) for consistent gene annotation (Haas et al., 2008). TE protein domains were identified with TEsorter and masked in EVM (Zhang et al., 2022). PASA updated EVM annotations by adding UTRs and alternative splicing information (Haas et al., 2003). Annotations with abnormal coding frames or short lengths (< 50 amino acids) were removed. tRNA was annotated with tRNAscan-SE (Lowe and Eddy, 1997), rRNA with Barrnap, and ncRNAs with RfamScan. Redundancy was removed, and BUSCO assessed the integrated protein annotations.

Protein-coding gene functions were annotated using (1) eggNOG-mapper for GO and KEGG annotations (Huerta-Cepas et al., 2017); (2) DIAMOND alignment with Swiss-Prot, TrEMBL, NR, and Arabidopsis thaliana databases (Buchfink and Huson, 2015), requiring > 30% sequence identity and E-value < 1e-5; and (3) InterProScan for domain analysis against PRINTS, Pfam, SMART, and CDD databases (Jones et al., 2014).

2.6. Gene family identification and phylogenetic analysis

To determine orthology and orthogroups, we retrieved genome data from 18 plant species from public databases. These species were selected to represent a broad phylogenetic range across Lamiales and related angiosperm lineages, allowing robust inference of orthogroups, gene family evolution, and paleopolyploidy events. Close relatives within Lamiales (e.g., Streptocarpus, Primulina, Boea and Oreocharis) provided resolution within the clade, while more distant eudicot species (e.g., Vitis, Coffea, Jasminum, Tectona) served as outgroups for tree rooting. OrthoFinder2 analyzed orthology and orthogroups (OGs) with "-M msa" (Emms and Kelly, 2019). Protein alignments from 1273 OGs (≥ 84.2% species with single-copy genes) were used to build an ML tree in IQ-TREE (Nguyen et al., 2015) under the JTT + F + R4 model (1000 bootstraps). A species tree was inferred with ASTRAL (Yin et al., 2019) using 4416 single-copy gene trees (≥ 70% species), differing slightly from the ML tree. ASTRAL's result was used for further analysis.

Divergence times were estimated with MCMCTree in PAML (Yang, 2007) using 199 non-missing, single-copy orthologous genes, calibrated with TimeTree. CAFÉ was used to identify gene family expansion, contraction, and rapid evolution using the time tree and 154,621 orthologous families (Hahn et al., 2005).

Proteins were aligned using DIAMOND (Buchfink et al., 2015), and then WGDI was used to analyze collinearity and calculate Ks values (Sun et al., 2022). Orthologous collinear blocks were identified using the Orthology Index (Zhang et al., 2024, 2025). Paleopolyploidy events and ploidy levels were determined via collinearity dot plots, with Ks values estimating event timing and frequency.

2.7. Genome mapping and SNP calling

Whole-genome resequencing generated approximately 10.95 billion reads, totaling 2.24 terabases of raw sequence data. The raw reads were trimmed for adapters and ensured for high quality (Chen et al., 2018). Further, they were mapped to the reference genome with BWA-MEM (Li, 2013), followed by the removal of duplicate reads via SAMtools (Li et al., 2009). To ensure data quality, only bases with Phred quality scores above 20 and mapping quality scores exceeding 30 were retained, yielding a final set of 206,951,631 high-confidence genomic sites (dataset 1). Initial variant calling with GATK identified 133,071,832 raw SNPs in Oreocharis mileensis. These variants were further filtered using VCFtools with stringent criteria (Danecek et al., 2011): sites with genotype quality < 20 or read depth < 5 were marked as missing; non-biallelic sites and non-SNP variants were excluded; SNPs with missing data rates exceeding 20% were removed; and variants with minor allele frequencies below 0.05 were discarded. This filtering process resulted in a refined dataset of 7,415,406 high-quality SNPs (dataset 2) suitable for downstream analyses.

2.8. Inference of population structure based on all loci, putatively adaptive loci, and neutral loci

We used PLINK 1.90b4.1 (Chang et al., 2015) to remove loci in linkage disequilibrium (–indep-pairwise 50 5 0.35), from dataset 2 and retained 752,569 SNPs (dataset 3, all loci). This dataset was further processed to determine environment associated SNPs. The environmental data, which included 19 bioclimatic variables at 30-s resolution, was downloaded from WorldClim v.2.1 database (Fick and Hijmans, 2017). Each environmental factor was extracted through the coordinates of sampling points. We selected the top climatic factors based on weighted R² importance (Fig. S5) using a machine learning gradient forest (GF) model in the R package gradientforest v.0.1–37 (Ellis et al., 2012) by modeling the relationship of climatic variables and 50,000 randomly selected SNPs with 500 regression trees. To reduce multicollinearity among environmental variables, we retained only those with an absolute Pearson correlation coefficient below 0.7, calculated using the R package corrplot v.0.92 (Friendly, 2002; Wei et al., 2017). From this analysis, we selected the four most informative and uncorrelated variables: Mean Temperature of Driest Quarter (BIO9), Mean Temperature of Warmest Quarter (BIO10), Precipitation of Wettest Month (BIO13), and Precipitation of the Driest Month (BIO14). We then applied BayeScEnv (Villemereuil and Gaggiotti, 2015) to detect SNPs associated with environmental factors. BayeScEnv is a univariate method for genotype–environment association. For this analysis, environmental variables were standardized by mean variance, and genotype data in codominant format (derived from dataset 3) were prepared using PGDSpider v.2.1.1.5 (Lischer and Excoffier, 2012). RDA, conducted with vegan v.2.6.2 (Dixon, 2003), was used with VCF files converted via PLINK v.1.9 (Purcell et al., 2007), with missing values replaced by the most frequent non-missing value (Forester et al., 2018). A 999th ANOVA assessed significance, with candidate SNPs identified at ±3 SD (p = 0.0027). SNPs determined by both methods were classified as putatively adaptive SNPs (dataset 4; 1737 SNPs), other SNPs were classified as neutral SNPs (dataset 5; 750,832 SNPs). Gene Ontology enrichment via eggNOG-mapper (Cantalapiedra et al., 2021) was conducted to determine those adaptive SNPs functions.

Population structure was then inferred using ADMIXTURE 1.3.0 with the optimal number of populations (K) determining the minimum cross-validation error. We performed PCA in R for each dataset and constructed a Maximum Likelihood (ML) phylogenetic tree using RAxML (Stamatakis, 2014) specifically for dataset 3. This dataset was chosen for tree construction as previous studies have shown that different tree-building methods produce similar results for comparable genomic datasets (Shen et al., 2024).

Because ADMIXTURE does not capture recent gene flow and generates highly structured populations (sometimes even between close populations), we utilized fineSTRUCTURE (Lawson et al., 2012) on dataset 2. It is a haplotype-based clustering method that identifies groups of individuals with shared ancestry through recombination-aware modeling. Phase files and recombination maps were provided as input, and coancestry matrices were generated using Chromopainter. The fineSTRUCTURE algorithm was then run using a minimum SNP threshold of 5000 per chromosome (-s1minsnps 5000), 10,000 MCMC iterations for burn-in (-s3iters 10,000), and 10,000 iterations for tree building (-s4iters 10,000). To support fineSTRUCTURE results, we used PopVAE, a variational autoencoder framework and unsupervised deep learning method tailored to population genomics. It projects individuals into a low-dimensional latent space, capturing both linear and nonlinear population structure. We used popvae.py with the following parameters: –seed 42 to ensure reproducibility and –patience 300 to prevent overfitting by using early stopping. The input file was dataset 2, containing representative individuals and high-quality biallelic SNPs. Output latent space coordinates were used to infer population clusters and visualize patterns of divergence and admixture.

2.9. Population genetic parameters and runs of homozygosity

We analyzed population genetic parameters using dataset 2 in VCFtools v.0.1.17 (Danecek et al., 2011), computing nucleotide diversity (π), pairwise genetic differentiation (F_ST), and Tajima's D statistics across 10 kb non-overlapping windows. For π calculations, we included both polymorphic and monomorphic sites. Additional metrics including observed heterozygosity (Ho), expected heterozygosity (He), and inbreeding coefficients (F_IS) were generated using VCFtools' –het function (Danecek et al., 2011). Linkage disequilibrium (LD) patterns were evaluated across populations using PopLDdecay v.3.4.1 (Zhang et al., 2018), which quantified the rate of LD decay. For inbreeding analysis, we detected runs of homozygosity (ROH) in dataset 2 using VCFtools v.0.1.15 (Danecek et al., 2011) with the –LROH parameter, considering only ROH segments exceeding 100 kb. We calculated the genomic inbreeding coefficient (FROH) as the fraction of the genome covered by these long ROH segments (≥ 100 kb) relative to the effective genome size (Yang et al., 2018).

2.10. Identification of deleterious mutations

We identified deleterious variants in Oreocharis mileensis using SIFT4G (Vaser et al., 2016) with plant orthologs from TrEMBL (Boeckmann et al., 2003). Coding SNPs in dataset 1 (low-frequency variants) were categorized as Deleterious (SIFT score < 0.05), Tolerated (SIFT score ≥ 0.05), or Synonymous. Low-confidence and "NA" sites were filtered prior to analysis (Sim et al., 2012).

2.11. Demographic history inference

We reconstructed the demographic history of Oreocharis mileensis through two complementary approaches. First, we applied the Pairwise Sequentially Markovian Coalescent (PSMC) model to analyze a consensus genome. The input data comprised BAM files (dataset 1) converted to 'psmcfa' format with depth thresholds of 10 × (minimum) and 100 × (maximum). PSMC analysis employed standard parameters: time intervals set to '4 + 25*2 + 4 + 6' and 30 optimization cycles (-N). For each sample, we analyzed 500 kb genomic segments with 10 bootstrap replicates. Second, we performed independent demographic inference using Stairway Plot 2 (Liu and Fu, 2020). The folded site frequency spectrum (SFS) was generated with ANGSD v.0.921 (Korneliussen et al., 2014) from dataset 1 using quality filters "-minMapQ 30 -minQ 20". Both analyses incorporated identical evolutionary parameters: a mutation rate (μ) of 3.7 × 10⁻⁹ substitutions/site/year (calculated as Ks/2T, where Ks = 0.12 represents the synonymous substitution rate and T = 16.2 million years reflects the divergence time between O. mileensis and Primulina huaijiensis; Fig. S6). We assumed a generation time of 3 years based on field observations of O. mileensis life history.

2.12. Isolation by distance (IBD), by environment (IBE) and drivers of genetic variation

To evaluate spatial genetic patterns, we examined relationships between genetic differentiation (F_ST/(1-F_ST)) and both geographic and environmental (BIO9, BIO10, BIO13, BIO14) distances. Geographic and environmental distance matrices were computed as Euclidean distances from coordinates. We performed IBD (environmental distance was controlled) and IBE (geographic distance was controlled) analyses using partial Mantel test with statistical significance assessed through 999 permutations. These analyses were conducted separately for putatively adaptive (dataset 4) and neutral (dataset 5) SNP sets. To evaluate the relative contributions of climate, geography, and population structure to genomic variation, we performed partial redundancy analyses (pRDA) on putatively adaptive loci and neutral loci. The genotype matrix (coded 0, 1, 2; missing data as 9) was imputed using most frequent genotypes and standardized prior to analysis. Those four selected climate variables, geographic coordinates (latitude, longitude), and ADMIXTURE-derived population structure were used for this analysis. We constructed a full model incorporating all three predictor sets, along with partial models assessing each component's unique contribution while conditioning on the others. Model significance was evaluated through 999 permutation tests, with variance partitioning used to separate shared and independent effects. All analyses were conducted in R using the vegan package (Dixon, 2003).

2.13. Gene flow, introgression and genetic offset

We utilized TreeMix (Pickrell and Pritchard, 2012) to infer population segregation and migration events, testing models with 1–15 migration edges (m) over 20 iterations. SNP dataset 3 (Rellstab et al., 2015) was used with "-global", "-se", and "-noss" parameters for this analysis. The optimum of migration was determined using OptM v.0.1.5, based on 99.8% variance explained and maximized Δm statistic.

To investigate historical introgression among populations, we applied Patterson's D-statistic and the f4 admixture ratio using Oreocharis hekouensis as the outgroup, following the approach of Patterson et al. (2012). These analyses were conducted using the Dtrios module from the Dsuite package (Malinsky et al., 2021), which allows for inclusion of sites with non-missing genotypes in the outgroup taxa. Statistical significance of the D values was assessed through block jackknife resampling, providing a robust measure of introgression signals.

The R package gradientforest v.0.1–37 was used to predict genomic vulnerability by GF model (Ellis et al., 2012). Dataset 4 of 1737 SNPs from Bayescenv and RDA was compared to 1737 randomly selected SNPs (dataset 3). SNPs with MAF > 10% were converted to population-level MAFs. GF, with 500 regression trees per SNP, modeled BIO9, BIO10, BIO13, and BIO14. GO was calculated as Euclidean distance between current (1970–2000) and future (2081–2100) climates under BCC-CSM2-MR and SSP126/SSP585 scenarios. Local, forward, and reverse offsets (Gougherty et al., 2021) were calculated using bioclimatic variables to assess vulnerability, gene flow potential, and preadaptation.

3. Results 3.1. Genome sequencing and assembly

The libraries sequenced on the PacBio Sequel Ⅱ platform obtained a total of 6.24 million reads with ~91.15 Gb HiFi data (~46×), and an average read length of 15 kb (the longest read was 47 kb, and N50 was 14 kb) (Table S1). A total of 1480 million reads (~221.97 Gb) of Hi-C sequencing data and 1918 million reads (~287.67 Gb, ~144×) of whole-genome short-read sequencing data were produced (Tables S4 and S5). A total of ~11 million reads (~14.65 Gb) of Iso-Seq sequencing data were generated using the ONT platform for third-generation transcriptome (Table S6). Based on PacBio HiFi sequencing data and Hi-C-assisted assembly, we obtained a haplotype-resolved genome.

Oreocharis mileensis genome assembly spans ~3.99 Gb, with exceptional continuity demonstrated by contig and scaffold N50 values of ~124 Mb each (Table S7). The phased assembly comprises two haplotypes: haplotype A (2.01 Gb; contig N50 123 Mb, scaffold N50 124 Mb) and haplotype B (1.98 Gb; contig/scaffold N50 124 Mb) (Tables S8 and S9). This assembly size shows strong agreement with independent k-mer-based estimates from FindGSE and Genomescope (Fig. S7). A total of 99.97% of the sequences from the two haplotypes were anchored onto 34 chromosomes (Fig. 1A and Table S10). Most of the chromosomes were assembled with complete telomeric sequences, and the number of gaps was only 1, indicating extremely high continuity (Table 1). The mitochondrial and chloroplast genomes were assembled into circular DNA molecules of 515,902 bp and 154,542 bp, respectively (Table S10). Evaluation of sequencing depth demonstrated comprehensive genome coverage, with strong mapping efficiency metrics (Table S11). The assembly demonstrated strong completeness with 1588 (98.3%) complete BUSCO groups out of 1614 groups, including 12 (0.7%) single-copy and 1576 (97.6%) duplicated genes. Few genes were fragmented (0.7%) or missing (1.0%) (Table S5; Figs. 1B, S2 and S3). A total of 56,834 genes and 130,476 transcripts were also assembled from the Iso-seq reads, with average lengths of 2606 and 2554, respectively (Table S12).

Fig. 1 Genomic features and evolution of Oreocharis mileensis. (A) The features of the haplotype-resolved genome assembly, (1) the distributions of pseudochromosomes, (2) class Ⅰ TE density, (3) class Ⅱ TE density, (4) protein-coding gene density, (5) proportion of tandem repeat, (6) GC content and (7) syntenic blocks. (B) Hi-C heatmap (100-kb bins; MAPQ ≥ 1) visualizing intra-haplotype interactions (color-scale: low[yellow]-high[red]). (C) Phylogenetic tree showing gene family contraction and expansion (pink: contracted; green: expanded; values at the nodes represent the time of differentiation and 95% CI).

Figure options

Table 1 Oreocharis mileensis genome assembly and annotation.

Parameter	Haplotype A	Haplotype B
Total assembly size (bp)	2,010,122,338	1,982,108,909
GC content (%)	37.93	37.91
Total number of contigs	29	17
Maximum contig length (bp)	158,257,637	156,577,511
Minimum contig length (bp)	57,505	71,877,888
Contig N50 (bp)	122,524,390	124,064,422
Contig N90 (bp)	87,302,949	92,624,260
Total number of scaffolds	28	17
Maximum scaffold length (bp)	158,257,637	156,577,511
Minimum scaffold length (bp)	57,505	71,877,888
Scaffold N50 (bp)	123,777,028	124,064,422
Scaffold N90 (bp)	93,537,943	92,624,260
Gap number	1	0
Complete BUSCOs	1586 (98.3%)	1585 (98.2%)
Complete single-copy BUSCOs	1305 (80.9%)	1301 (80.6%)
Complete and duplicated BUSCOs	281 (17.4%)	284 (17.6%)
Fragmented BUSCOs	11 (0.7%)	13 (0.8%)
Missing BUSCOs	17 (1.0%)	16 (1.0%)

Table options

3.2. Genome annotation

The Oreocharis mileensis genome contained 3.46 million repetitive sequences spanning ~3.16 Gb (79.05% of the genome). Long terminal repeats (LTRs) dominated these repeats, comprising ~2.90 Gb (72.69%), with Gypsy (41.84%) and Copia (18.98%) elements being the most abundant LTR families (Table S13). After the integration of annotations, 89,009 protein-coding genes, 114,466 transcripts and 114,466 coding sequences were identified, with mean lengths of 3719, 1727 and 1135, respectively (Table S14). In addition, 3111 rRNA, 2211 tRNA, and 7324 ncRNA sequences were identified (Table S15). Protein annotation completeness was evaluated using BUSCO, showing 98.5% core gene coverage (97.1% duplicated, 1.4% single-copy). Only 0.4% were fragmented and 1.1% missing (Table S7).

A total of 100,655 genes in both haplotypes were annotated using different software tools. The annotation results based on eggNOG-mapper, including GO (33,188, 37.29%), KEGG_KO (31,159, 35.01%), EC (12,998, 14.60%), KEGG_Pathway (19,457, 21.86%), eggNOG (62,202, 69.88%), and COG (66,381, 74.58%). The annotation results based on DIAMONG included Swiss-Prot (48,892, 54.93%), TrEMBL (76,802, 86.29%), NR (67,152, 75.44%) and Arabidopsis thaliana (58,924, 66.20%). The annotation results based on InterProScan included Pfam (55,121, 61.93%), SMART (21,007, 23.60%), Gene3D (45,672, 51.31%), and CDD (21,622, 24.29%) and others (Table S16).

3.3. Analysis of phylogeny and WGD

We used haplotype A of Oreocharis mileensis, consisting of 44,569 genes, for phylogeny and WGD events. OrthoFinder2 analysis of 19 genomes identified 65,638 genes grouped into 22,933 orthologous families, with O. mileensis showing significant gene family evolution (2519 expanded and 1539 contracted; Fig. S8). Phylogenetic reconstruction using 1273 conserved orthogroups revealed divergence times within Gesneriaceae: O. mileensis split from O. esquirolii and O. maximowiczii at ~8 (6.3–9.9) Mya, Oreocharis split from Primulina at ~14.7 (12.1–17.8) Mya and from Streptocarpus at ~19.4 (16.2–22.9) Mya, with all three genera diverging from Boea earlier at ~22.9 (19.3–26.8) Mya (Fig. 1C).

A total of 23,126 post-filter collinear gene pairs and 1224 post-filter collinear blocks were inferred within the Oreocharis mileensis genome (Table S17). Unlike most core eudicots, Vitis vinifera and Coffea canephora appear to have retained their diploid state following the ancestral γ hexaploidization, showing no evidence of lineage-specific whole-genome duplications (WGD). Therefore, these two species are used as references to investigate the relative ploidy levels and inferring the WGD events in other species. O. mileensis and the closely related species Primulina huaijiensis, and Streptocarpus rexii show an obvious 1:8 relationship in collinearity compared to the reference species (Coffea canephora), which may be the result of three lineage-specific tetraploidization events (Fig. S9). The synteny ratio between O. mileensis and O. esquirolii, O. mileensis and O. maximowiczii, O. mileensis and P. huaijiensis retained its diploid state, indicating that these species share all WGD events (only O. mileensis and P. huaijiensis synteny plot is provided in Fig. S6). The synteny ratio of O. mileensis/P. huaijiensis to Streptocarpus rexii is 2:2, indicating that each species may have experienced a lineage-specific tetraploidization event (a deduction that also matches the peaks of Ks values), meaning that they share the first two tetraploidization events (Fig. S10). When comparing O. mileensis with itself or with P. huaijiensis, there is first a recent (Ks ≈ 0.2) duplicated collinear block, which can be confirmed as a tetraploidization event (Fig. S11). Each chromosome roughly corresponds to two older collinear blocks (with Ks ≈ 0.8) and approximately four even older collinear blocks (with Ks ≈ 1.2), which correspond to two more ancient tetraploidization events.

3.4. Population structure and phylogeny

Whole-genome resequencing generated ~11.12 Gb (15.68× coverage) of high-quality data per sample, with 97.25% of filtered reads (Q20: 96.68%; Q30: 94.82%) mapping to the reference (Table S18). Population analyses using two SNP datasets (all and neutral loci) revealed optimal genetic structure at K = 8 through ADMIXTURE (Fig. S12A and B). The results from these two datasets consistently indicate a highly structured population, with similar patterns of genetic differentiation across the groups (Figs. 2A, S13A and S14A). DHB1 and DHB2 exhibited identical genetic compositions, suggesting they constitute a single population. Furthermore, two populations (XY and DHB) shared genetic profiles, with only a slight genetic admixture observed from an additional population (QT). Minor genetic mixing was also detected between two other populations (MXJ and YL), whereas the remaining populations had a significant genetic structure. For putatively adaptive loci, the best K was 7, merging DHB1, DHB2 and XY as the same component of ADMIXTURE and MXJ and YL as well (Fig. S15A). Due to this pronounced genetic differentiation, PCA was conducted and visualized for components 1 to 6 across all three datasets (Figs. 2B, S13B, S14B and S15B). The PCA results revealed a clear separation into four distinct clusters in PC1 and PC2 axes with 45.66%, 45.61%, and 44.57% of genome covariance for all loci, neutral loci, and putatively adaptive loci, respectively. These clusters were identified as follows: Cluster 1 – DHB1, DHB2, QT, and XY; Cluster 2 – GB, GS, MXJ, and YL; Cluster 3 – EJIA; and Cluster 4 – WS. Further analysis demonstrated that GB and GS were separated from each other in PC1 and PC3 in all datasets. Similarly, MXJ and YL were distinguished in PCA 1 and PCA 5 in the all and neutral loci datasets (Fig. S14B), while EJIA and WS were differentiated in PC 1 vs. PC 4, 5, and 6. The results of fineSTRUCTURE were consistent with these findings (Fig. S15C and D). The GS/GB/YL/MXJ, DHB/QT/XY, WS and EJIA populations formed separate clusters, however, co-ancestry within these clusters was found as weak, highlighting ancient not recent geneflow. popVAE result was similar to previous analyses, however, QT was separated from DHB and XY while GS was separated from GB/MXJ/YL populations (Fig. S15E).

Fig. 2 (A) Population structure, (B) PCA and (C) ML tree of Oreocharis mileensis using all loci.

Figure options

Maximum-likelihood phylogeny showed strong concordance with ADMIXTURE results, where populations sharing genetic ancestry clustered together phylogenetically (Fig. 2C). In brief, DHB, XY and QT, MXJ and YL, GB and GS were clustered together, while EJIA and WS were both distinct from all other populations.

3.5. Parameters of population genetics

Nucleotide diversity (π), Tajima's D, inbreeding coefficient (F_IS), observed heterozygosity (Ho), and expected heterozygosity (He) were calculated using VCFtools v.0.1.17 (Table S19 and Fig. S16). π values ranged from 0.000285 (GS) to 0.000422 (WS), indicating low diversity across populations. Tajima's D ranged from 0.0829 (MXJ) to 0.53 (WS), while F_IS values were high (0.59 in WS to 0.86 in GS). He values were almost consistent, ranging from 0.2538 (GS, XY) to 0.2560 (WS). Genetic differentiation (F_ST) ranged from 0.20 (MXJ-YL) to 0.77 (EJIA-GS), showing high differentiation (Fig. 3A and Table S20).

Fig. 3 (A) Genetic differentiation among nine populations of Oreocharis mileensis. (B) Fractions of runs of homozygosity (FROH) illustrating variation in inbreeding levels, ranked in decreasing order across populations. (C) Size-distribution of runs of homozygosity (ROH > 1 Mb [solid dots] and 100 kb–1 Mb [hollow dots]). (D) Genome-wide decay of linkage disequilibrium (LD). (E) Population-level genetic load, showing ratios of homozygous-derived deleterious mutations. Populations sharing same letters in (B) and (E) lack significant differences.

Figure options

We next assessed genome-wide runs of homozygosity (ROH), which reflects recent inbreeding through extended homozygous segments. Following the method outlined by Robinson et al. (2022), we estimated the number of generations since the most recent common ancestor (g) using the formula g = 100/(2*L), where L represents the mean ROH length in megabases (Mb). Inbreeding events were estimated to occur between 250 (GS) and 1322 (WS) generations ago (Fig. S17). FROH was highest in GS (median = 0.1038) and lowest in WS (median = 0.0008) (Fig. 3B). Populations with severe inbreeding had longer ROH (> 1 Mb), with GS showing the highest proportion (3.86% of the genome), while WS and XY had none (Fig. 3C and Table S21).

Linkage disequilibrium (LD) decay varied across populations, with MXJ showing the slowest decay and EJIA, QT, and WS the fastest (Fig. 3D). Based on SIFT prediction approach, we detected a total of 327,684 deleterious mutations, 624,179 tolerated mutations and 859,658 synonymous mutations (Table S22). Our results showed that the number of homozygous deleterious sites to total deleterious mutations in the MXJ and GS populations were significantly higher than in other populations, suggesting that these populations had a higher homozygous genetic load (Fig. 3E). YL and GB populations also had more significant genetic load compared to other populations.

3.6. IBD, IBE, drivers of genetic variation, and gene flow

We identified candidate genomic variants associated with climate adaptation. BayeScEnv analysis detected 6348 loci associated with four selected environmental variables, whereas RDA identified 5368 loci across five axes. To minimize false positives, 1737 loci overlapping in both analyses were classified as environment-associated (Table S23). Of those 1737 loci, 201 correlated with Mean Temperature of Driest Quarter, 1332 correlated with Mean Temperature of Warmest Quarter, 184 correlated with Precipitation of Wettest Month, and 189 correlated with Precipitation of the Driest Month. The divergence pattern was tested by IBD and IBE analyses using partial Mantel test. According to IBD and IBE analyses, geographic distance did not explain genetic variation (r = 0.173, p = 0.26 for neutral loci and r = 0.21, p = 0.13 for putatively adaptive loci), but environmental distance was significant (r = 0.65, p = 0.002 for neutral loci and r = 0.21, p = 0.01 for putatively adaptive loci; Fig. 4A and B).

Fig. 4 (A) Isolation-by-distance (partial Mantel test, controlled for environment) for neutral (blue) and adaptive loci (orange), with shaded 95% CIs. (B) Isolation-by-environment (partial Mantel test), controlling for geography, with identical locus coding. (C) Population relationships and inferred gene flow (TreeMix ML tree; arrow direction indicates migration; scale bar shows drift magnitude). (D) Dsuite heatmap showing introgression between populations.

Figure options

To assess the relative contributions of climate, geography, and population structure to genetic variation, we performed pRDA analyses on both putatively adaptive and neutral loci (Table S24). For putatively adaptive loci, the full model (climate + geography + structure) explained 87.87% of genomic variation, with population structure alone accounting for 87.18%, climate 72.02%, and geography 31.13% when controlling for other factors. The first two RDA axes captured 79.01% of constrained variation, and permutation tests confirmed all predictors were significant (p = 0.001). Neutral loci showed weaker overall associations, with the full model explaining 63.85% of variation. Population structure explained most variation (61.02%), while climate (40.48%) and geography (18.94%) had reduced but still significant effect (p = 0.01). TreeMix analysis identified three migration events (99.986% variance explained), with the optimal model (iteration 11) showing minimal residual error (Tables S25 and S26; Fig. S18). TreeMix indicated limited and weak gene flow or admixture among O. mileensis populations (Figs. 4C and S19). Three weak gene flow events were predicted: between YL and GB, XY and the ancestors of YL, MXJ, GB, and GS, and between GS and the ancestors of WS and EJIA. WS and EJIA were genetically distinct from other populations.

The D-statistics analysis revealed distinct patterns of gene flow for EJIA and WS, indicating that each has undergone introgression with different genetic clusters (Fig. 4D). EJIA showed significant but minor introgression with GB (p < 0.001, Z > 3, Table 2). Also, gene flows from YL to GB (p < 0.001, Z > 3), and from GB to MXJ (p < 0.001, Z > 3), were determined in f branch. In addition, WS exhibited gene flow signals with the DHB/QT/XY cluster. We found significant but modest introgression between WS and QT, with WS likely receiving gene flow from QT or a QT-related lineage. In addition, comparisons such as XY-DHB-QT showed significant but minor introgression.

Table 2 D-statistics and F4-ratios from Dsuite for trios in this study.

P1	P2	P3	Dstatistic	Z-score	p-value	f4-ratio	BBAA	ABBA	BABA
DHB	QT	WS	0.025949	5.65135	1.59193e-08	0.0252922	110,964	6190.46	5877.32
XY	DHB	QT	0.0873944	14.87	2.3e-16	0.0694322	23605.6	19327.6	16220.8
GS	GB	EJIA	0.0387814	5.64186	1.68228e-08	0.0132275	105,298	10655.5	9859.84
GS	GB	MXJ	0.0670171	6.67424	2.48518e-11	0.0467408	26292.6	20163.2	17630.4
MXJ	YL	GB	0.113714	6.41758	1.38455e-10	0.0819861	40064.3	16791.3	13362.4
P1, population 1; P2, population 2; P3, Population 3.

Table options

3.7. Genomic vulnerability to climate change

GF modeling showed higher genomic vulnerability for all populations (Fig. 5A–D). Putatively adaptive SNPs exhibited higher GO than reference SNPs under both scenarios (ssp585 and ssp126), indicating greater sensitivity to climate change. Analysis of local, forward, and reverse offsets using an RGB model indicated that northeastern regions showed were more vulnerable despite potential migration (Fig. 5E and F). However, for other populations, local offset was higher than forward and reverse offsets suggesting urgent need for assisted gene flow or habitat restoration. Functional annotation of outlier SNPs predicted functions for 276 genes (p < 0.05; Table S27). These genes were assigned to categories, including 25 biological processes, such as ion balance maintenance, oxidative damage reduction, signal transduction, and metabolism. Further, Cell Protection & Repair were related to stress tolerance and Cell Expansion & Hydration Recovery, Metabolic Reactivation and Hormonal & Transcriptional Reactivation were related to rapid growth after hydration.

Fig. 5 Genetic offset projections for 2100 under SSP585 and SSP126 scenarios (A, B) using reference SNPs and (C, D) adaptive SNPs (higher values indicate higher genetic offset). (E, F) RGB visualization of multidimensional offsets (red: local, green: forward, blue: reverse) under (E) SSP585 and (F) SSP126 (brightness reflects higher offset while darkness shows lower). Lower panels show bivariate distributions of E and F.

Figure options

3.8. Demographic history of Oreocharis mileensis

Because PSMC is primarily effective at reconstructing ancient demographic histories and Stairway Plot is particularly effective at resolving recent demographic changes, we decided to use both approaches. PSMC analysis identified two severe population declines, with continuous reductions in effective population size (Ne) and no recovery (Fig. 6A). The first decline occurred ~0.6 Mya during the Mid–Pleistocene Transition, reducing Ne to ~15.4 × 10⁴. After a small stable period, Ne declined (only one population had a recovery followed by a sharp decline) again between 0.67 and 0.2 Mya. Stairway Plot analysis revealed a continuous decline in Ne across all populations, punctuated by some sharper reductions (Figs. 6B and S20). Notably, the XY population exhibited a stable Ne trend ~100,000 years ago (Fig. S20), though this anomaly may reflect bias rather than true demographic stability. The first major decline (~0.86 Mya), reducing Ne to 1.08 × 10⁶, aligned with the Mid–Pleistocene Transition, a period of intensified glacial cycles and climatic instability (Fig. 6B). The second decline (~50,000 years ago) reduced Ne to ~1.42 × 10⁵, aligning with the Late Pleistocene and Last Glacial Maximum. Further declines occurred ~6000 years ago (Ne ~ 29,600) at the end of the Holocene Climate Optimum and ~3000 years ago (Ne ~9400) during the Late Holocene Neoglacial period. Finally, Ne dropped to ~950 by 114 years ago, coinciding with the early Industrial Era and human-driven environmental changes.

Fig. 6 (A) Demographic history of Oreocharis mileensis inferred by PSMC. Severe reduction of effective population during Middle Pleistocene is highlighted with green dashed lines. (B) Demographic history reconstruction using folded site frequency spectra revealed sharp declines in effective population size (Ne with 95% confidence intervals shown as light green bounds (Stairway Plot v.2). Green dashed lines highlight periods of rapid population contraction.

Figure options

4. Discussion 4.1. High-quality genome assembly of Oreocharis mileensis advances conservation genomics and evolution

Conservation genomics depends on high-quality genetic data and whole-genome sequencing provides unparalleled resolution for studying evolutionary processes and population dynamics (Barrick and Lenski, 2013; Jing et al., 2023; Song et al., 2023). Despite its potential, application remains limited for many plant groups, particularly karst-adapted species (Feng et al., 2020; Cao et al., 2023; Peng et al., 2025). In this study, we assembled the genome of Oreocharis mileensis, a resurrection plant endemic to karst forests in southern China. The high-contiguity assembly exhibits excellent gene completeness and sequencing coverage, establishing a foundation for conservation prioritization. The Oreocharis assembly also provides a reliable high-quality reference genome for the study of the drought resistance and resurrection processes of Oreocharis species in karst regions in the near future.

Our assembly shows that the Oreocharis mileensis genome (1.9 Gb) was larger than that of both O. esquirolii (1.24 Gb) and O. maximowiczii (1.78 Gb) (Peng et al., 2025). Although O. mileensis had fewer genes (44,569) than O. esquirolii (47,637), our structural analysis of genomes showed that O. mileensis has evolved more genes and lost fewer genes than relatives within the clade. Research has shown that species with more gene gains than losses frequently exhibit improved capacities in biological processes important to key adaptations (Jacquemin et al., 2014). We have yet to determine whether gains and losses in O. mileensis genes are adaptive.

The genus Oreocharis is likely to have experienced three lineage-specific tetraploidization events, with the divergence time between Oreocharis spp./Primulina huanjiangensis and Streptocarpus rexii (22.9–16.2 Mya) being consistent with the most recent genome duplication event shared by plants in the Didymocarpoid subtribe (24.2–20.6 Mya) (Feng et al., 2020).

4.2. Inbreeding and possible selective purging shape diversity in Oreocharis mileensis populations

We found that Oreocharis mileensis populations have long ROHs, suggesting that these populations are vulnerable due to recent inbreeding and substantial genetic load. ROH patterns directly reflect increased homozygosity (Kirin et al., 2010), while the resulting accumulation of homozygous deleterious mutations creates a compounding fitness burden (Bortoluzzi et al., 2020; Kyriazis et al., 2021). Thus, our findings are consistent with previous studies that indicate small populations face elevated extinction risks from inbreeding depression, where restricted mate availability increases homozygosity of deleterious alleles (Bortoluzzi et al., 2020). We also found that nucleotide diversity is lower in O. mileensis than in other threatened herbaceous plant species (including O. esquirolii and O. maximowiczii), confirming that increased homozygosity has reduced genetic diversity in these populations (Lin et al., 2024; Peng et al., 2025) (Table S28). Several factors have led to frequent inbreeding within populations of O. mileensis. Specifically, inbreeding is more common when populations are isolated, located in independent limestone hills and/or fragmented habitats. Our ROH analysis identified four core O. mileensis populations that show signs of recent inbreeding (i.e., GS, YL, GB, and MXJ). For example, the GS population exhibited high ROH, a high inbreeding coefficient (F_IS ≈ 0.86), a significant burden of homozygous deleterious mutations, and notably low genetic diversity, consistent with inbreeding events occurring approximately 250 generations ago.

Interestingly, O. mileensis populations on the periphery of karst forests (e.g., QT) displayed a unique genetic profile: a lower proportion of ROH, fewer homozygous deleterious mutations, and reduced nucleotide diversity (π = 0.000312), yet a high F_IS value (≈0.85). This pattern may result from a combination of reduced recent inbreeding, possible genetic purging of deleterious mutations, historical demographic events, and selective pressures (Cvijović et al., 2018; Grossen et al., 2020). The absence of long ROH segments in QT suggests that recombination may have fragmented these regions, rendering them undetectable by ROH analysis. The EJIA population exhibited similar genetic indicators, further supporting this interpretation.

Other peripheral populations (WS, DHB, and XY) demonstrated higher genetic diversity but similarly lower ROH, and fewer homozygous deleterious mutations. The reduced burden of deleterious mutations in these populations may reflect the possible effective purging of harmful alleles over generations, potentially driven by natural selection. These peripheral populations also occupy basal positions in the phylogeny, suggesting they diverged earlier and potentially experienced habitat fragmentation and population isolation before the core (GB/GS/MXJ/YL) populations. Such early fragmentation may have allowed more time for possible purifying selection and recombination to act, reducing the detectability of long ROH segments today.

4.3. Drivers of genetic variation and their role in formation of the extremely small and isolated populations

Partial redundancy analysis (pRDA) revealed striking differences in the drivers of genetic variation between adaptive and neutral loci. For putatively adaptive loci, the variation was overwhelmingly aligned with pre-existing population structure (87.18% of explained variation). This strongly suggests that historical demographic processes such as long-term environmental isolation, genetic drift in small populations, and barriers to gene flow have been the primary architects of the current distribution of adaptive genetic diversity, exceeding the direct influence of contemporary climate (72.02%) or geographic distance (31.13%). These findings align with studies in other plant systems (Yang et al., 2025), where neutral processes creating structure often provide the substrate upon which selection acts. For neutral loci, the overall model fit was weaker, but the pattern held the largest proportion of explained variation (61.02%) was associated with population structure. This indicates that the same historical demographic forces (isolation and drift) have also been the principal drivers of neutral genetic divergence. The significant but smaller contributions of climate and geography likely reflect isolation-by-environment and isolation-by-distance (although IBD was not significant by partial Mantel test, pRDA analysis found its effect as well) processes operating within this overarching demographic history (Mantel et al., 2003; Sexton et al., 2014). Therefore, our results demonstrate that the complex demographic history of the populations was likely driven by the complex karst landscape.

PSMC analysis identified two severe declines in Oreocharis mileensis: the first (~1–0.6 Mya) during the Mid–Pleistocene Transition reduced Ne to ~15.4 × 10⁴, and the second (0.67–0.2 Mya) further decreased Ne. Unlike PSMC, Stairway Plot revealed continuous Ne decline, with more reductions tied to major climatic and anthropogenic events: the Mid–Pleistocene Transition (~0.86 Mya), Late Pleistocene, Holocene Climate Optimum, Neoglacial period, and early the Industrial Era leading to the lowest Ne of ~950. This temporal complementarity between methods was expected, as PSMC provides robust estimates of ancient demographic changes while Stairway Plot offers finer resolution of recent population dynamics.

TreeMix results indicated that gene flow was minimal between two geographically close populations of Oreocharis mileensis (YL and GB), confirming these populations are genetically isolated regardless of distance. Dsuite analyses indicated that all O. mileensis populations experienced minor introgression. We also found that the source of introgression into the DHB/QT/XY cluster was the WS population, while the source of introgression into the GB/GS/MXJ/YL cluster was the EJIA population. These two populations (WS and EJIA) appear to have diverged first from the common ancestor, with subsequent radiation of the remaining populations occurring long ago. Additionally, the absence of recent gene flow and the effects of genetic drift have led to highly structured populations. This is supported by positive Tajima's D values across populations (Table S19), which suggest recent population contractions, reduced genetic diversity, and the formation of small, isolated populations (Tajima, 1989).

These results show that limited gene flow, historical demography, past environmental fragmentation, and recent anthropogenic pressures have together shaped the deeply structured, small, and isolated populations seen in Oreocharis mileensis today.

4.4. Local adaptation alleles as drivers of climate change-induced genomic vulnerability

Populations within the same species often exhibit differential responses to climate change due to local adaptation to heterogeneous environmental conditions (O'Neil et al., 2014; Fitzpatrick and Keller, 2015; Cortés, 2017). Understanding these patterns of local adaptation is crucial for predicting how species may respond to future climatic shifts. In this study, we quantified the GO by calculating the distance between current and future climate conditions and integrating intraspecific variation into the GF model. We found all populations O. mileensis exhibited significantly higher GO under both future climate scenarios. Our study also showed that under the ssp126 scenario, core populations (GB, GS, MXJ and YL) face significant adaptive risks, while under both climate change scenarios (ssp126 and ssp585) the local offsets were higher than forward and reverse offsets, suggesting that assisted migration or gene flow may to some extent enable adaptation to future climate change.

Shorter generation times could theoretically enhance adaptive capacity by accelerating allele frequency shifts in response to selection (Ojeda et al., 2016; Campbell-Staton et al., 2017). However, the persistence of this advantage depends on the rate and magnitude of climate change relative to the species' evolutionary potential. If climatic shifts outpace the speed of adaptive evolution, even rapidly reproducing species may face significant genomic vulnerability, as suggested by the high GO values in our study. Furthermore, while herbaceous plants may adapt more quickly than trees, their shorter lifespans and smaller effective population sizes could also increase stochasticity in adaptive trajectories (Willi et al., 2006), potentially leading to divergent outcomes from model projections. In addition, demographic costs of selection, such as elevated mortality or reproductive failure can outpace adaptive responses, causing populations to go extinction before achieving evolutionary rescue (Gomulkiewicz and Holt, 1995). Thus, while our GO models provide critical insights into near-term vulnerability, the dynamic interplay between generation time, selection, and genetic architecture warrants caution when extrapolating results over longer timescales. Future studies could integrate demographic modeling with genomic offset frameworks to better resolve how generation time mediates climate adaptation in herbaceous systems.

The physiology of resurrection plants like Oreocharis mileensis enables them to survive during severe desiccation and to regenerate when a sufficient level of water is in the environment (Li et al., 2014). Our search for adaptive loci indicated that putatively adapted SNPs are mainly located in genes responsible for ion balance maintenance, oxidative damage reduction, signal transduction, and metabolism, as well as cell protection and repair. These categories of genes correspond to complex survival strategies used by resurrection plants during dehydrated states caused by the absence of seasonal rainfall (Dinakar and Bartels, 2013). We assume that genes in these categories play key roles in the adaptation of O. mileensis to climate change. We also found that some putatively adaptive SNPs are located in genes related to cell expansion and hydration recovery, as well as metabolic reactivation and hormonal & transcriptional reactivation, which are important for rapid growth of the plant during the presence of rainfall.

4.5. Implications for conservation planning and management practices

Like many karst-dwelling species in Yunnan, Oreocharis mileensis is strongly affected by human activity (Chen et al., 2014). Our field surveys revealed that overgrazing by goats affects several populations of O. mileensis, including the DHB population, adjacent populations located within a nature reserve, and a population in the in situ conservation site established by the local government department (GS). Thus, effective conservation guidelines and management strategies must be developed based on the establishment of well-defined management units (MUs) (Barbosa et al., 2018; Yang et al., 2022; Cai et al., 2024). Our findings lead us to recommend designating eight MUs of O. mileensis. Populations that have experienced inbreeding require genetic rescue through assisted gene flow (Pavlova et al., 2017). In the case of O. mileensis, GS, GB, MXJ and YL populations have low nucleotide diversity, high inbreeding, and high genetic load. These populations appear to be the core populations over the entire geographical distribution of Oreocharis mileensis. Further, genomic offset revealed that these populations may experience higher allele frequency change due to climate change. Gene flow among populations was almost absent, even between geographically close populations. Considering the presence of homozygous deleterious mutations, we recommend facilitating assisted gene flow between MXJ (fewer shared deleterious mutations) or YL (higher genetic diversity) and GB (Fig. S21). Due to high genetic divergence (F_ST = 0.407), assisted gene flow between GB and YL should be conducted under controlled conditions. Our findings also indicate that the GB population can serve as a pollen source to rescue GS. The geographic and environmental proximity of these populations may enhance the likelihood of success (Frankham et al., 2011), although experimental trials are needed to evaluate outbreeding depression risks given their even greater divergence (F_ST = 0.485). Unfortunately, other peripheral populations (i.e., DHB, QT and XY) are isolated from the core populations (F_ST at least 0.637); thus, intercrossing these populations may be risky and lead to outbreeding depression. We therefore do not recommend intercrossing between peripheral and core populations. The QT population is located in the edge of the overall distribution of O. mileensis. The genetic make-up of XY and QT populations is similar and share fewer shared homozygous deleterious mutations. Therefore, we recommend conserving QT population and strengthening its genetic makeup using assisted gene flow from XY population to allow it to adapt to climate change. However, this also should be conducted under control as mentioned above.

The peripheral EJIA and WS populations are both isolated from all other populations and from each other as well. Due to the high risk of outbreeding depression, assisted gene flow should not be attempted here. For these populations, we first recommend protecting and restoring their natural habitats to mitigate external threats such as habitat degradation, overgrazing, or human encroachment. However, this may not be sufficient to maintain their resilience (Belovsky et al., 1994). Thus, genetic material should be preserved through collecting seeds and storing them in germplasm bank and cultivating live plants as ornamental in botanical gardens (Breman et al., 2021).

Acknowledgments

This work was equally supported by the National Key R&D Program of China (Grant No. 2024YFF1307400) and Yunnan Key R&D Program (Grant No. 202403AC100028), Yunnan Fundamental Research Projects (Grant No. 202301AT070318) and Yunnan Revitalization Talent Support Program "Young Talent" Projiect. We thank Dr. Yang Liu, Dr. Shi-Wei Guo, Dr. Ji-Dong Ya, Prof. Zhi-Kun Wu, Prof. Xin-Xiang Bai, Prof. Pan Li, Mr. Sheng-Hu Tang, Mr. Jian Xu, Mr. Bo Pan and Mr. De-Ming He, for their assistance in collecting materials and providing species distribution information, and we thank Dr. Feng-Mao Yang, Dr. Yang Liu, Dr. Bishal Gurung, Dr. Yu-Hang Chang and Ms. Na–Na Peng for their help in analyzing and providing data.

CRediT authorship contribution statement

Temur Asatulloev: Writing – original draft, Visualization, Methodology, Investigation, Formal analysis, Data curation. Lei Cai: Writing – original draft, Visualization, Resources, Investigation, Funding acquisition, Formal analysis. Ziyoviddin Yusupov: Writing – review & editing. Kai-Hua Jia: Formal analysis. Ren-Gang Zhang: Formal analysis. Komiljon Sh. Tojibaev: Supervision, Writing – review & editing. Wei-Bang Sun: Supervision, Resources, Project administration, Funding acquisition, Conceptualization, Writing – review & editing.

Data availability statement

The data that supports the findings of this study are available in the supplementary material of this article and whole genome sequences and annotations are located under PRJCA037867 bioproject (https://ngdc.cncb.ac.cn/gsa/s/h9annOxq).

Declaration of competing interest

The authors Weibang Sun and Komiljon Sh. Tojibaev are editors for Plant Diversity and were not involved in the editorial review or the decision to publish this article. The other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2025.11.001.

References

Abzhanov, A., Extavour, C.G., Groover, A., et al., 2008. Are we there yet? Tracking the development of new model systems. Trends Genet., 24: 353-360. DOI:10.1016/j.tig.2008.04.002

Aguilar, R., Quesada, M., Ashworth, L., et al., 2008. Genetic consequences of habitat fragmentation in plant populations: susceptible signals in plant traits and methodological approaches. Mol. Ecol., 17: 5177-5188. DOI:10.1111/j.1365-294X.2008.03971.x

Allendorf, F.W., 2017. Genetics and the conservation of natural populations: allozymes to genomes. In: Wiley Online Library.

Barbosa, S., Mestre, F., White, T.A., et al., 2018. Integrative approaches to guide conservation decisions: using genomics to define conservation units and functional corridors. Mol. Ecol., 27: 3452-3465. DOI:10.1111/mec.14806

Barrick, J.E., Lenski, R.E., 2013. Genome dynamics during experimental evolution. Nat. Rev. Genet., 14: 827-839. DOI:10.1038/nrg3564

Belovsky, G.E., Bissonette, J.A., Dueser, R.D., et al., 1994. Management of small populations: concepts affecting the recovery of endangered species. Wildl. Soc. Bull., 22: 307-316.

Bijlsma, R., Loeschcke, V., 2012. Genetic erosion impedes adaptive responses to stressful environments. Evol. Appl., 5: 117-129. DOI:10.1111/j.1752-4571.2011.00214.x

Boeckmann, B., Bairoch, A., Apweiler, R., et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31: 365-370. DOI:10.1093/nar/gkg095

Bortoluzzi, C., Bosse, M., Derks, M.F., et al., 2020. The type of bottleneck matters: insights into the deleterious variation landscape of small managed populations. Evol. Appl., 13: 330-341. DOI:10.1111/eva.12872

Breman, E., Ballesteros, D., Castillo-Lorenzo, et al., 2021. Plant diversity conservation challenges and prospects—the perspective of botanic gardens and the Millennium Seed Bank. Plants, 10: 2371. DOI:10.3390/plants10112371

Brys, R., Jacquemyn, H., 2016. Severe outbreeding and inbreeding depression maintain mating system differentiation in Epipactis (Orchidaceae). J. Evol. Biol., 29: 352-359. DOI:10.1111/jeb.12787

Buchfink, B., Xie, C., Huson, D.H., 2015. Fast and sensitive protein alignment using DIAMOND. Nat. Methods, 12: 59-60. DOI:10.1038/nmeth.3176

Cai, L., Liu, D., Yang, F., et al., 2024. The chromosome-scale genome of Magnolia sinica (Magnoliaceae. provides insights into the conservation of plant species with extremely small populations (PSESP). GigaScience, 13: giad110. DOI:10.1093/gigascience/giad110

Campbell-Staton, S.C., Cheviron, Z.A., Rochette, N., et al., 2017. Winter storms drive rapid phenotypic, regulatory, and genomic shifts in the green anole lizard. Science, 357: 495-498. DOI:10.1126/science.aam5512

Cantalapiedra, C.P., Hernández-Plaza, A., Letunic, I., et al., 2021. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol., 38: 5825-5829. DOI:10.1093/molbev/msab293

Cantarel, B.L., Korf, I., Robb, S.M., et al., 2008. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res., 18: 188-196. DOI:10.1101/gr.6743907

Cao, Y., Almeida-Silva, F., Zhang, W.P., et al., 2023. Genomic insights into adaptation to karst limestone and incipient speciation in East Asian Platycarya spp. (Juglandaceae). Mol. Biol. Evol., 40: msad121. DOI:10.1093/molbev/msad121

Chang, C.C., Chow, C.C., Tellier, L.C., et al., 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4: s13742-13015-10047-13748.

Chen, W.H., Shui, Y.M., Yang, J.B., et al., 2014. Taxonomic status, phylogenetic affinities and genetic diversity of a presumed extinct genus, Paraisometrum WT Wang (Gesneriaceae. from the karst regions of Southwest China. PLoS One, 9: e107967. DOI:10.1371/journal.pone.0107967

Chen, S., Zhou, Y., Chen, Y., et al., 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34: i884-i890. DOI:10.1093/bioinformatics/bty560

Chen, Y., Dong, L., Yi, H., et al., 2024. Genomic divergence and mutation load in the Begonia masoniana complex from limestone karsts. Plant Divers., 46: 575-584. DOI:10.3390/biomimetics9090575

Cheng, H., Concepcion, G.T., Feng, X., et al., 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods, 18: 170-175. DOI:10.1038/s41592-020-01056-5

Cook, D., Espejo Valle-Inclan, J., Pajoro, A., et al., 2018. Long Read Annotation (LoReAn.: automated eukaryotic genome annotation based on long-read cDNA sequencing. Plant Physiol., 177: 848-2018.

Cook, D., Espejo Valle-Inclan, J., Pajoro, A., et al., 2018. Long Read Annotation (LoReAn. : automated eukaryotic genome annotation based on long-read cDNA sequencing. Plant Physiol. 177, 848–2018.

Cvijović, I., Good, B.H., Desai, M.M., 2018. The effect of strong purifying selection on genetic diversity. Genetics, 209: 1235-1278. DOI:10.1534/genetics.118.301058

Danecek, P., Auton, A., Abecasis, G., et al., 2011. The variant call format and VCFtools. Bioinformatics, 27: 2156-2158. DOI:10.1093/bioinformatics/btr330

Dinakar, C., Bartels, D., 2013. Desiccation tolerance in resurrection plants: new insights from transcriptome, proteome and metabolome analysis. Front. Plant Sci., 4: 482.

Dixon, P., 2003. VEGAN, a package of R functions for community ecology. J. Veg. Sci., 14: 927-930. DOI:10.1111/j.1654-1103.2003.tb02228.x

Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull, 19: 11-15.

Ducrettet, J., Maurice, S., Imbert, E., 2023. How much do we know and how much do we care about genetic diversity of threatened plants? A case study from the French flora. Bot. Lett., 170: 110-118. DOI:10.1080/23818107.2022.2125902

Dudchenko, O., Batra, S.S., Omer, A.D., et al., 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science, 356: 92-95. DOI:10.1126/science.aal3327

Durand, N.C., Shamim, M.S., Machol, I., et al., 2017. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst., 3: 95-98.

Ellis, N., Smith, S.J., Pitcher, C.R., 2012. Gradient forests: calculating importance gradients on physical predictors. Ecology, 93: 156-168. DOI:10.1890/11-0252.1

Ellstrand, N.C., Elam, D.R., 1993. Population genetic consequences of small population size: implications for plant conservation. Annu. Rev. Ecol. Syst., 24: 217-242. DOI:10.1146/annurev.es.24.110193.001245

Emms, D.M., Kelly, S., 2019. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol., 20: 238. DOI:10.1186/s13059-019-1832-y

Feng, C., Wang, J., Wu, L., et al., 2020. The genome of a cave plant, Primulina huaijiensis, provides insights into adaptation to limestone karst habitats. New Phytol., 227: 1249-1263. DOI:10.1111/nph.16588

Feng, Y., Comes, H.P., Chen, J., et al., 2024. Genome sequences and population genomics provide insights into the demographic history, inbreeding, and mutation load of two 'living fossil' tree species of Dipteronia. Plant J., 117: 177-192. DOI:10.1111/tpj.16486

Fick, S.E., Hijmans, R.J., 2017. Worldclim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol., 37: 4302-4315. DOI:10.1002/joc.5086

Fitzpatrick, M.C., Keller, S.R., 2015. Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation. Ecol. Lett., 18: 1-16.

Forester, B.R., Lasky, J.R., Wagner, H.H., et al., 2018. Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations. Mol. Ecol., 27: 2215-2233. DOI:10.1111/mec.14584

Frankham, R., Ballou, J.D., Eldridge, et al., 2011. Predicting the probability of outbreeding depression. Conserv. Biol., 25: 465-475. DOI:10.1111/j.1523-1739.2011.01662.x

Friendly, M., 2002. Corrgrams: exploratory displays for correlation matrices. Am. Statistician, 56: 316-324. DOI:10.1198/000313002533

Gao, Q., Xu, W., 2011. Paraisometrum W.T. Wang, a newly recorded genus of Gesneriaceae from Guizhou, China. Acta Bot. Boreali Occident. Sin., 31: 858-860.

Gomulkiewicz, R., Holt, R.D., 1995. When does evolution by natural selection prevent extinction?. Evolution, 49: 201-207. DOI:10.1111/j.1558-5646.1995.tb05971.x

Gougherty, A.V., Keller, S.R., Fitzpatrick, M.C., 2021. Maladaptation, migration and extirpation fuel climate change risk in a forest tree species. Nat. Clim. Change, 11: 166-171. DOI:10.1038/s41558-020-00968-6

Grossen, C., Guillaume, F., Keller, L.F., et al., 2020. Purging of highly deleterious mutations through severe bottlenecks in Alpine ibex. Nat. Commun., 11: 1001. DOI:10.1038/s41467-020-14803-1

Haag, T., Santos, A.S., Sana, D.A., et al., 2010. The effect of habitat fragmentation on the genetic structure of a top predator: loss of diversity and high differentiation among remnant populations of Atlantic Forest jaguars (Panthera onca). Mol. Ecol., 19: 4906-4921. DOI:10.1111/j.1365-294X.2010.04856.x

Haas, B.J., Delcher, A.L., Mount, et al., 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res., 31: 5654-5666. DOI:10.1093/nar/gkg770

Haas, B.J., Salzberg, S.L., Zhu, W., et al., 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol., 9: R7. DOI:10.1186/gb-2008-9-1-r7

Hahn, M.W., Bie, T.D., Stajich, et al., 2005. Estimating the tempo and mode of gene family evolution from comparative genomic data. Genome Res., 15: 1153-1160. DOI:10.1101/gr.3567505

Hill, W.G., 2010. Understanding and using quantitative genetic variation. Philos. Trans. R. Soc. B-Biol. Sci., 365: 73-85. DOI:10.1098/rstb.2009.0203

Hoban, S.M., Hauffe, H.C., Pérez-Espona, S., et al., 2013. Bringing genetic diversity to the forefront of conservation policy and management. Conserv. Genet. Resour., 5: 593-598. DOI:10.1007/s12686-013-9859-y

Hohenlohe, P.A., Bassham, S., Etter, P.D., et al., 2010. Population genomics of parallel adaptation in three spine stickleback using sequenced RAD tags. PLoS Genetics, 6: e1000862. DOI:10.1371/journal.pgen.1000862

Hu, J., Wang, Z., Liang, F., et al., 2023. NextPolish2: a repeat-aware polishing tool for genomes assembled using HiFi long reads. GPB, 22: qzad009.

Huerta-Cepas, J., Forslund, K., Coelho, L.P., et al., 2017. Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper. Mol. Biol. Evol., 34: 2115-2122. DOI:10.1093/molbev/msx148

Jacquemin, J., Ammiraju, J.S., Haberer, G., et al., 2014. Fifteen million years of evolution in the Oryza genus shows extensive gene family expansion. Mol. Plant, 7: 642-656. DOI:10.1093/mp/sst149

Jing, Z., Cheng, K., Shu, H., et al., 2023. Whole genome resequencing approach for conservation biology of endangered plants. Biodivers. Sci., 31: 22679. DOI:10.17520/biods.2022679

Jones, P., Binns, D., Chang, H.Y., et al., 2014. InterProScan 5: genome-scale protein function classification. Bioinformatics, 30: 1236-1240. DOI:10.1093/bioinformatics/btu031

Kardos, M., Armstrong, E.E., Fitzpatrick, S.W., et al., 2021. The crucial role of genome-wide genetic variation in conservation. Proc. Natl. Acad. Sci. U.S.A., 118: e2104642118. DOI:10.1073/pnas.2104642118

Kéry, M., Matthies, D., Spillmann, H.H., 2000. Reduced fecundity and offspring performance in small populations of the declining grassland plants Primula veris and Gentiana lutea. J. Ecol., 88: 17-30. DOI:10.1046/j.1365-2745.2000.00422.x

Kirin, M., McQuillan, R., Franklin, C.S., et al., 2010. Genomic runs of homozygosity record population history and consanguinity. PLoS One, 5: e13996. DOI:10.1371/journal.pone.0013996

Korf, I., 2004. Gene finding in novel genomes. BMC Bioinf., 5: 59. DOI:10.1186/1471-2105-5-59

Korneliussen, T.S., Albrechtsen, A., Nielsen, R., 2014. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics, 15: 1-13.

Kyriazis, C.C., Wayne, R.K., Lohmueller, K.E., 2021. Strongly deleterious mutations are a primary determinant of extinction risk due to inbreeding depression. Evol. Lett., 5: 33-47. DOI:10.1002/evl3.209

Lamichhaney, S., Fuentes-Pardo, A.P., Rafati, N., et al., 2017. Parallel adaptive evolution of geographically distant herring populations on both sides of the North Atlantic Ocean. Proc. Natl. Acad. Sci. U.S.A., 114: E3452-E3461.

Lawson, D.J., Hellenthal, G., Myers, S., et al., 2012. Inference of population structure using dense haplotype data. PLoS Genetics, 8: e1002453. DOI:10.1371/journal.pgen.1002453

Leitwein, M., Duranton, M., Rougemont, Q., et al., 2020. Using haplotype information for conservation genomics. Trends Ecol. Evol., 35: 245-258. DOI:10.1016/j.tree.2019.10.012

Li, A., Wang, D., Yu, B., et al., 2014. Maintenance or collapse: responses of extraplastidic membrane lipid composition to desiccation in the resurrection plant Paraisometrum mileense. PLoS One, 9: e103430. DOI:10.1371/journal.pone.0103430

Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv preprint arXiv:1303.3997.

Li, H., 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34: 3094-3100. DOI:10.1093/bioinformatics/bty191

Li, H., Handsaker, B., Wysoker, A., et al., 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25: 2078-2079. DOI:10.1093/bioinformatics/btp352

Li, X., Jiang, L.S., Deng, H.N., et al., 2024. High-resolution genome assembly and population genetic study of the endangered maple Acer pentaphyllum (Sapindaceae): implications for conservation strategies. Hortic. Res., 11: uhae357.

Lin, L., Cai, L., Huang, H., et al., 2024. Transcriptome data reveals the conservation genetics of Cypripedium forrestii, a plant species with extremely small populations endemic to Yunnan, China. Front. Plant Sci., 15: 1303625. DOI:10.3389/fpls.2024.1303625

Lin, Y., Ye, C., Li, X., et al., 2023. quarTeT: a telomere-to-telomere toolkit for gap-free genome assembly and centromeric repeat identification. Hortic. Res., 10: uhac127. DOI:10.1093/hr/uhad127

Lischer, H.E., Excoffier, L., 2012. PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28: 298-299. DOI:10.1093/bioinformatics/btr642

Liu, X., Fu, Y.-X., 2020. Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol., 21: 280. DOI:10.1007/s12250-020-00244-z

Liu, Y., Tan, Y.L., Li, Y.M., et al., 2024. Conservation and threatened status of plant species with extremely small populations in the karst region of southeastern Yunnan, China. Front. Plant Sci., 15: 1520363. DOI:10.3389/fpls.2024.1520363

Lowe, T.M., Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25: 955-964. DOI:10.1093/nar/25.5.955

Ma, H., Liu, Y.B., Liu, D.T., et al., 2021. Chromosome-level genome assembly and population genetic analysis of a critically endangered Rhododendron provide insights into its conservation. Plant J., 107: 1533-1545. DOI:10.1111/tpj.15399

Ma, Y.P., Liu, D., Wariss, H.M., et al., 2022. Demographic history and identification of threats revealed by population genomic analysis provide insights into conservation for an endangered maple. Mol. Ecol., 31: 767-779. DOI:10.1111/mec.16289

Ma, Y.P., Chen, G., Grumbine, R.E., et al., 2013. Conserving plant species with extremely small populations (PSESP) in China. Biodivers. Conserv., 22: 803-809. DOI:10.1007/s10531-013-0434-3

Malinsky, M., Matschiner, M., Svardal, H., 2021. Dsuite-Fast D-statistics and related admixture evidence from VCF files. Mol. Ecol. Resour., 21: 584-595. DOI:10.1111/1755-0998.13265

Manel, S., Schwartz, M.K., Luikart, G., et al., 2003. Landscape genetics: combining landscape ecology and population genetics. Trends Ecol. Evol., 18: 189-197. DOI:10.1016/S0169-5347(03)00008-9

Möller, M., Middleton, D.J., Nishii, K., et al., 2011. A new delineation for Oreocharis incorporating an additional ten genera of Chinese Gesneriaceae. Phytotaxa, 23: 1-36. DOI:10.11646/phytotaxa.23.1.1

Nguyen, L.T., Schmidt, H.A., Haeseler, A.V., 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol., 32: 268-274. DOI:10.1093/molbev/msu300

Nielsen, R., Paul, J.S., Albrechtsen, A., et al., 2011. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet., 12: 443-451. DOI:10.1038/nrg2986

Ojeda, F., van der Niet, T., Malan, M.C., et al., 2016. Strong signature of selection in seeder populations but not in resprouters of the fynbos heath Erica coccinea (Ericaceae). Bot. J. Linn. Soc., 181: 115-126. DOI:10.1111/boj.12395

O'Neil, S.T., Dzurisin, J.D., Williams, et al., 2014. Gene expression in closely related species mirrors local adaptation: consequences for responses to a warming world. Mol. Ecol., 23: 2686-2698. DOI:10.1111/mec.12773

Ou, S., Su, W., Liao, Y., et al., 2019. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol., 20: 275. DOI:10.1186/s13059-019-1905-y

Patterson, N., Moorjani, P., Luo, Y., et al., 2012. Ancient admixture in human history. Genetics, 192: 1065-1093. DOI:10.1534/genetics.112.145037

Pavlova, A., Beheregaray, L.B., Coleman, R., et al., 2017. Severe consequences of habitat fragmentation on genetic diversity of an endangered Australian freshwater fish: a call for assisted gene flow. Evol. Appl., 10: 531-550. DOI:10.1111/eva.12484

Peng, N.N., Yang, L.H., Shi, X.Z., et al., 2025. Genomic and population genomic analyses reveal contrasting diversity and demographic histories in a critically endangered and a widespread Oreocharis species. Plant Divers., 47: 746-758. DOI:10.1016/j.pld.2025.06.006

Pertea, M., Pertea, G.M., Antonescu, C.M., et al., 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol., 33: 290-295. DOI:10.1038/nbt.3122

Pickrell, J., Pritchard, J., 2012. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genetics, 8: e1002967. DOI:10.1371/journal.pgen.1002967

Poelstra, J.W., Vijay, N., Bossu, C.M., et al., 2014. The genomic landscape underlying phenotypic integrity in the face of gene flow in crows. Science, 344: 1410-1414. DOI:10.1126/science.1253226

Pryszcz, L.P., Toni, G., 2016. Redundans: an assembly pipeline for highly heterozygous genomes. Nucleic Acids Res., 44: e113. DOI:10.1093/nar/gkw294

Purcell, S., Neale, B., Todd-Brown, K., et al., 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81: 559-575. DOI:10.1086/519795

Qin, H., Yang, Y., Dong, S., et al., 2017. Threatened species list of China's higher plants. Biodivers. Sci., 25: 696-744. DOI:10.17520/biods.2017144

Qiu, Y., Hamernick, S., Ortiz, J.B., et al., 2020. DNA content and ploidy estimation of Festuca ovina accessions by flow cytometry. Crop Sci., 60: 2757-2767. DOI:10.1002/csc2.20229

Ralls, K., Ballou, J.D., Dudash, M.R., et al., 2018. Call for a paradigm shift in the genetic management of fragmented populations. Conserv. Lett., 11: e12412. DOI:10.1111/conl.12412

Rellstab, C., Gugerli, F., Eckert, A.J., et al., 2015. A practical guide to environmental association analysis in landscape genomics. Mol. Ecol., 24: 4348-4370. DOI:10.1111/mec.13322

Robinson, J.A., Kyriazis, C.C., Nigenda-Morales, S.F., et al., 2022. The critically endangered vaquita is not doomed to extinction by inbreeding depression. Science, 376: 635-639. DOI:10.1126/science.abm1742

Rodger, Y.S., Pavlova, A., Sinclair, S., et al., 2021. Evolutionary history and genetic connectivity across highly fragmented populations of an endangered daisy. Heredity, 126: 846-858. DOI:10.1038/s41437-021-00413-0

Schäfer, D., Vincent, H., Fischer, M., et al., 2020. The importance of genetic diversity for the translocation of eight threatened plant species into the wild. Glob. Ecol. Conserv., 24: e01240.

Sexton, J.P., Hangartner, S.B., Hoffmann, A.A., 2014. Genetic isolation by environment or distance: which pattern of gene flow is most common?. Evolution, 68: 1-15. DOI:10.1111/evo.12258

Shen, Y., Tao, L., Zhang, R., et al., 2024. Genomic insights into endangerment and conservation of the garlic-fruit tree (Malania oleifera., a plant species with extremely small populations. GigaScience, 13: giae070. DOI:10.1093/gigascience/giae070

Sim, N.L., Kumar, P., Hu, J., et al., 2012. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res., 40: W452-W457. DOI:10.1093/nar/gks539

Simão, F.A., Waterhouse, R.M., Ioannidis, P., et al., 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31: 3210. DOI:10.1093/bioinformatics/btv351

Song, B., Ning, W., Wei, D., et al., 2023. Plant genome resequencing and population genomics: current status and future prospects. Mol. Plant, 16: 1252-1268. DOI:10.1016/j.molp.2023.07.009

Spielman, D., Brook, B.W., Frankham, R., 2004. Most species are not driven to extinction before genetic factors impact them. Proc. Natl. Acad. Sci. U.S.A., 101: 15261-15264. DOI:10.1073/pnas.0403809101

Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30: 1312-1313. DOI:10.1093/bioinformatics/btu033

Sun, P., Jiao, B., Yang, Y., et al., 2022. WGDI: a user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant, 15: 1841-1851. DOI:10.1016/j.molp.2022.10.018

Sun, W.B., 2013. Conservation of Plant Species with Extremely Small Populations (PSESP) in Yunnan Province-Practice and Exploration. Yunnan Science and Technology Press, Kunming.

Sun, W.B., Yang, J., Dao, Z.L., 2019a. Study and Conservation of Plant Species with Extremely Small Populations (PSESP) in Yunnan Province. Science Press, China. Beijing.

Sun, W.B., Ma, Y.P., Blackmore, S., 2019b. How a new conservation action concept has accelerated plant conservation in China. Trends Plant Sci., 24: 4-6. DOI:10.1016/j.tplants.2018.10.009

Sun, W.B., Ma, Y.P., Corlett, R.T., 2024. Plant species with extremely small populations conservation program: achieving Kunming–Montreal global biodiversity targets. Trends Plant Sci., 29: 827-830. DOI:10.1016/j.tplants.2024.05.007

Tajima, F., 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics, 123: 585-595. DOI:10.1093/genetics/123.3.585

Tan, Y., Wang, Z., Sui, X.Y., et al., 2011. The systematic placement of the monotypic genus Paraisometrum (Gesneriaceae) based on molecular and cytological data. Plant Divers. Resour., 33: 465-476.

Vaser, R., Adusumalli, S., Leng, S.N., et al., 2016. SIFT missense predictions for genomes. Nat. Protoc., 11: 1-9. DOI:10.1038/nprot.2015.123

Villemereuil, P., Gaggiotti, O.E., 2015. A new FST-based method to uncover local adaptation using environmental variables. Methods Ecol. Evol., 6: 1248-1258. DOI:10.1111/2041-210X.12418

Weber, J.N., Peterson, B.K., Hoekstra, H.E., 2013. Discrete genetic modules are responsible for complex burrow evolution in Peromyscus mice. Nature, 493: 402-405. DOI:10.1038/nature11816

Wei, T., Simko, V., Levy, M., et al., 2017. Package 'corrplot'. Statistician, 56: e24.

Weitzman, A.L., Skog, L.E., Wang, W.T., et al., 1997. New taxa, new combinations, and notes on Chinese Gesneriaceae. Novon, 7: 423-435. DOI:10.2307/3391777

Willi, Y., Van Buskirk, J., Hoffmann, A.A., 2006. Limits to the adaptive potential of small populations. Annu. Rev. Ecol. Evol. Syst., 37: 433-458. DOI:10.1146/annurev.ecolsys.37.091305.110145

Xie, D.F., Zhang, Y.Y., Cai, J., et al., 2026. Speciation, endangerment and adaptation in limestone rocky environment of Urophysa (Ranunculaceae). Plant Divers, 48: 231-245. DOI:10.1016/j.pld.2025.09.006

Xu, W., Pan, B., Huang, Y., et al., 2009. Paraisometrum WT Wang, a newly recorded genus of Gesneriaceae from Guangxi, China. Guihaia, 29: 581-583.

Yan, C., An, M.T., Tang, M., et al., 2026. Chromosome-level genome assembly and population genomics analysis of Camellia rubituberculata provide insights into adaptation to karst habitats. Plant Divers, 48: 246-261. DOI:10.1016/j.pld.2025.09.004

Yang, F.M., Cai, L., Dao, Z.L., et al., 2022. Genomic data reveals population genetic and demographic history of Magnolia fistulosa (Magnoliaceae., a Plant Species with Extremely Small Populations in Yunnan Province, China. Front. Plant Sci., 13: 811312. DOI:10.3389/fpls.2022.811312

Yang, J., Cai, L., Liu, D.T., et al., 2020. China's conservation program on Plant Species with Extremely Small Populations (PSESP): progress and perspectives. Biol. Conserv., 244: 108535. DOI:10.1016/j.biocon.2020.108535

Yang, J., Wariss, H.M., Tao, L., et al., 2019. De novo genome assembly of the endangered Acer yangbiense, a plant species with extremely small populations endemic to Yunnan Province, China. GigaScience, 8: giz085. DOI:10.1093/gigascience/giz085

Yang, Y., Ma, T., Wang, Z., et al., 2018. Genomic effects of population collapse in a critically endangered ironwood tree Ostrya rehderiana. Nat. Commun., 9: 5449. DOI:10.1038/s41467-018-07913-4

Yang, Y.Z., Sun, P.W., Ke, C.Y., et al., 2025. Towards climate-resilient conservation: integrating genetics and environmental factors in determining adaptive units of a xeric shrub. Glob. Ecol. Conserv., 57: e03417.

Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol., 24: 1586-1591. DOI:10.1093/molbev/msm088

Yang, Z., Liang, L., Xiang, W., et al., 2024. Conservation genomics provides insights into genetic resilience and adaptation of the endangered Chinese hazelnut, Corylus chinensis. Plant Divers., 46: 294-308. DOI:10.1016/j.pld.2024.03.006

Yin, J., Zhang, C., Mirarab, S., 2019. ASTRAL-MP: scaling ASTRAL to very large datasets using randomization and parallelization. Bioinformatics, 35: 3961-3969. DOI:10.1093/bioinformatics/btz211

Zhang, C., Dong, S.S., Xu, J.Y., et al., 2018. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics, 35: 1786-1788.

Zhang, R., Li, G., Wang, X.L., et al., 2022. TEsorter: an accurate and fast method to classify LTR-retrotransposons in plant genomes. Hortic. Res., 9: uhac017. DOI:10.1093/hr/uhac017

Zhang, R.G., Shang, H.Y., Milne, R.I., et al., 2025. SOI: robust identification of orthologous synteny with the Orthology Index and broad applications in evolutionary genomics. Nucleic Acids Res., 53: gkaf320. DOI:10.1093/nar/gkaf320

Zhang, R.G., Shang, H.Y., Zhou, M.J., et al., 2024. Robust identification of orthologous synteny with the Orthology Index and its applications in reconstructing the evolutionary history of plant genomes. bioRxiv, 202408.

Article Information

Article history

Workspace