b. Research Center for Biodiversity and Nature Conservation, Guizhou University, Guiyang 550025, China;
c. College of Forestry, Jiangxi Agricultural University, Nanchang 330045, China;
d. Jiangxi Provincial Key Laboratory of Conservation Biology, Jiangxi Agricultural University, Nanchang 330045, China
The genus Camellia, a major source of woody oils, ornamental flowers, and beverage ingredients, stands as one of the world's most important sources of commercial woody plants (Xia et al., 2017). China is a center of Camellia diversity, with 97 out of the approximately 120 recorded Camellia species listed in the Flora of China and mainly distributed in the south of the Yangtze River, of which 76 are endemic (Min, 2007). Guizhou, China, serves as a significant region for the distribution of Camellia species with a total of 52 species and 6 varieties, of which 21 species are endemic to the region, including C. luteoflora, C. huana, and C. tachangensis (Min, 2007). Owing to its unique karst landform and climatic conditions, Guizhou is also one of China's top three tea-producing regions, renowned for exceptional varieties such as Duyun Maojian, Meitan Turquoise Bud, and Pu'an Black Tea, and oil tea planting area of approximately 2460 km2 in 2023 (Guo et al., 2024). The abundant Camellia plant resources in Guizhou are close relatives or primitive ancestors of cultivated tea trees, with unique genes and traits—including stress resistance (e.g., cold, drought, and pest/disease tolerance) and distinctive flavor compounds (e.g., high amino acid content and specialized aroma components). These Camellia plant resources not only provide germplasm support for the development of Guizhou's tea industry but also offer valuable materials for studying the evolution and speciation of Camellia species in unique geological (e.g., karst) and climatic conditions.
Karst regions are considered as 'natural laboratories' and are now common sites for the study of species diversity and adaptive evolution (Clements et al., 2006; Gao et al., 2015; Feng et al., 2020). The inherent complexity and diversity of karst landscapes often result in island-like species distributions (Chen et al., 2024). This isolation, coupled with notable environmental selection pressures (including low soil water content, high calcium levels, infertile soils, and drought stress), plays a key role in driving the adaptive evolution of physiological and morphological traits, potentially increasing speciation rates through reduced gene flow (Nie et al., 2011; Hao et al., 2015; Monro et al., 2018). Consequently, unraveling karst speciation and adaptive mechanisms has consistently been a focus of evolutionary biology (Wallace 1858; Turanchik and Kane, 1979). Karst ecosystems impose severe selective pressures. Although recent advances integrating whole-genome data with environmental records have begun to elucidate the molecular adaptation mechanisms in some plants within these ecosystems (Feng et al., 2020; Cao et al., 2023; Zhou et al., 2023; Wen et al., 2024). However, few studies have specifically adaptations of Camellia spp. to karst environments. For example, Lu et al. (2024), who established a chromosome-scale assembly of Camellia limonia and revealed lineage-specific expansions in calcium regulatory modules (e.g., calmodulin signaling, Ca2+ transporters, and secondary metabolite pathways) that are mechanistically linked to calcium tolerance, providing an important foundation for studying karst adaptation within the genus. Moreover, to date, the majority of Camellia genomic studies have focused on gene function for Camellia species distributed in non-karstic habitats (e.g., C. oleifera, C. lanceoleosa, C. granthamiana, C. chekiangoleosa, C. sinensis var. sinensis LJ43, and C. sinensis TGY), and the specific genomic underpinnings of karst adaptation in Camellia species remain largely unexplored (Wang et al., 2020; Zhang et al., 2021a; Lin et al., 2022; Gong et al., 2022; Shen et al., 2022a).
Camellia rubituberculata is one of the most typical and distinctive species in section Tuberculata of the genus Camellia. It is endemic to the karst region of Qianxi'nan, Guizhou, China, and exhibits unique morphological traits such as wrinkled pericarp and red flower (Fig. 1A–F) (Chang and Ren, 1991; Min, 2007). To date, only two wild populations of C. rubituberculata (SEC and LT) are currently documented (Fig. 1A, blue dot and green dot), occupying 0.49 km2 of karst limestone terrain with sandstone-shale intermixing (Fig. 1B and C) (Deng et al., 2011). However, during our field surveys in 2023 and 2024, we found that due to its proximity to villages, this species faces persistent threats from logging and habitat degradation. With its population continuously declining, protective measures for its native habitats should be implemented—even though the species has not yet been listed on the IUCN Red List. With its high fruit oil content and karst adaptability (Ran et al., 2024), this species is a valuable woody oil crop and a critical resource for combating rocky desertification. Over the past decade, this species has garnered significant research interest among scientists, who have conducted studies on its habitat characteristics, seed germination, population dynamics, classification, chloroplast genome and selective breeding (Deng et al., 2011; Dai et al., 2016; Yang et al., 2018; Xiao et al., 2022; Ran et al., 2024). Notably, the Guizhou Academy of Forestry (LKY) (Fig. 1A, red dot) has conducted a series of cultivation, domestication, and elite variety selection trials in non-karst environments with well-managed conditions (Dai et al., 2016). Scientifically, as a karst-endemic Camellia species, this taxon demonstrates remarkable adaptive traits to limestone habitats with seasonal drought including: (1) specialized root architecture for enhanced water/nutrient acquisition, (2) leathery leaf and fruit tuberculate and thick-walled morphology, and (3) physiological adaptations to calcium-rich, alkaline soils. These characteristics make it serve as an excellent model organism for exploring woody plant adaptation evolution in karst evolution.
|
| Fig. 1 Geographic distribution and images of sequenced Camellia rubituberculata. (A) Geographic distribution of the only three known populations of C. rubituberculata. LT (blue dot): a wild population located in Longtou village, Xingren County, Qianxi'nan Prefecture, Guizhou; SEC (green dot): a wild population located in Shi'e'chang village, Qinglong County, Qianxi'nan Prefecture, Guizhou; LKY (red dot): a cultivated population located at Guizhou Academy of Forestry. (B) Image of a C. rubituberculata plant growing on karstic limestone mountains. (C) Image of a C. rubituberculata plant. (D) Image of three fruits of C. rubituberculata. (E) Image of the pink flower of C. rubituberculata. (F) Image of C. rubituberculata seeds. |
Clarifying the mechanisms underpinning adaptability and genetic diversity in woody plants inhabiting challenging environments, such as karst environments is critical for developing effective molecular breeding strategies and conservation plans (Whitlock and Lotterhos, 2015; Sang et al., 2022). In this context, genome-wide and population genetics approaches have emerged as fundamental tools for deciphering the genomic basis of adaptation and evolutionary processes (Ellegren, 2014; Evans et al., 2014; Chen et al., 2019; Xiang et al., 2024). A key evolutionary mechanism shaping plant genomes is whole-genome duplication (WGD), which is experienced by almost all angiosperms and is often accompanied by hybridization events (Cui et al., 2006; Soltis et al., 2009; Jiao et al., 2011). Following WGD or hybridization, duplicated genes can fuel genetic innovation, increase diversity and potentially provide new environmental adaptation that drive species differentiation (Hegarty and Hiscock, 2008; Van de Peer et al., 2009; Jiao et al., 2011). The investigation of these events holds significant theoretical and practical value for plant biology, species conservation, and genetic breeding. Moreover, WGD events frequently lead to the retention of duplicated genes with specific molecular functions, causing expansions in gene families that equip plants to withstand diverse environmental stresses (Ellegren, 2014; Ren et al., 2018; Wu et al., 2020). The genus Camellia presents a compelling case study for these phenomena. The frequent hybridization and polyploidization within the genus complicate classification and phylogenetic studies but simultaneously make Camellia species ideal models for investigating plant genome size variation and evolution. Despite this potential, the genetic and genomic research background for Camellia, particularly wild Camellia species, remains remarkably limited. Current genomic resources are predominantly confined to high-value cultivars (e.g., C. sinensis (DASZ), C. sinensis var. sinensis (LJ43), C. sinensis (TGY), C. oleifera, and C. chekiangoleosa) (Zhang et al., 2020, 2021; Wang et al., 2020; Lin et al., 2022; Shen et al., 2022a), leaving the vast majority of wild species genomically unexplored. Given its status within Camellia as a narrowly distributed, karyologically typical 'karst stone plant', C. rubituberculata has emerged as an exceptional model for probing evolutionary adaptation and genetic diversity in karst habitats using these genomic approaches.
To directly address the critical research gap in understanding Camellia adaptation in karst environments and leverage the unique attributes of C. rubituberculata, a chromosome-level genome for this species was assembled. Then, using this genomic resource, comparative genomics and phylogenetic analyses were conducted to elucidate the phylogenetic position and chromosome structure of C. rubituberculata and identify WGD events. Furthermore, population genomic approaches were employed to analyze the genetic mechanisms underlying its adaptation to the water-scarce and calcified soils inherent to karst habitats. This study significantly expands the whole-genome data available for wild Camellia species and provides a crucial molecular perspective for understanding the adaptive mechanisms essential for survival in karst environments. Ultimately, the genomic resources and insights generated herein lay a vital foundation for the conservation of the valuable genetic resources of C. rubituberculata and could inform future breeding efforts for Camellia species suited to challenging soils.
2. Materials and methods 2.1. Sample collectionAn individual was selected from the LTpopulation (Longtou village) from the LT population (Longtou village, Qianxi'nan prefecture, Guizhou Province, China, 105°18′ E, 25°35′ N, 1423 m) for whole-genome sequencing (WGS) and assembly. This individual was chosen on the basis of its optimal physiological condition and sufficient tissue availability for high-molecular-weight DNA extraction. Fresh leaves and flowers were collected from five additional individuals for transcriptome sequencing for genome annotation. A total of 77 individuals from two wild populations (LT: 30 samples, SEC: 30 samples) and one cultivated population (LKY: 17 samples, as control group) were sampled, and fresh leaves (2 g per individual) were collected for whole-genome resequencing. All of the sampled individuals were healthy adult plants, and a minimum spacing of 30 m between sampled individuals within each population was ensured to minimize kinship. Tissue samples for transcriptome sequencing were collected from 15 individuals (5 per population) subsampled from the cohort of 77 resequenced individuals. The sampling strategy was consistent with that for the genome resequencing samples.
2.2. Library construction and genome sequencingDNA was extracted from fresh leaves using the CTAB method and sheared into ~350 bp fragments via focused ultrasonication (Doyle and Doyle, 1987; Larguinho et al., 2010). PE sequencing libraries were prepared following standard Illumina protocols. All libraries were subjected to NovaSeq 6000 sequencing (Illumina). For long-read sequencing, high-molecular-weight (HMW) genomic DNA was isolated using the DNeasy Mini Kit (Qiagen), followed by shearing and size selection to construct 15 kb SMRTbell libraries. These libraries were sequenced on the PacBio Sequel Ⅱ platform (Pacific Biosciences) using single-molecule real-time (SMRT) sequencing technology. Hi-C libraries were prepared by chromatin cross-linking followed by fragmentation into ~350 bp fragments using focused ultrasonication (Covaris), after which paired-end sequencing (2 × 150 bp) was performed on the Illumina NovaSeq 6000 platform to capture the three-dimensional chromatin architecture.
2.3. Chromosome-level genome assemblyGenome assembly was initiated with a survey of genome characteristics via K-mer profiling using Jellyfish (v.2.3.1) (Marçais and Kingsford, 2011) and GenomeScope2 (Ranallo-Benavidez et al., 2020). PacBio HiFi reads were error-corrected and de novo assembled using hifiasm (v.0.16.1-r375) (Cheng et al., 2021). Redundant sequences were removed from the primary assembly using purge_dups (Guan et al., 2020). Clean Hi-C reads were aligned to the purged assembly using HICUP (v.0.8.2) (Wingett et al., 2015). Chromosome-level assembly was performed with ALLHiC (v.0.9.8) using Hi-C data for chromosomal anchoring (Zhang et al., 2019b). Scaffold/contig chromosomal positioning and orientation were guided by Hi-C interaction frequencies using Juicebox (v.1.11.08) (Robinson et al., 2018). Assembly completeness and sequence uniformity were assessed by realigning short-insert PE reads to the reference genome using BWA (Li and Durbin, 2009). The final genome quality was assessed using BUSCO (v.5.2.2) (Manni et al., 2021), and the LTR Assembly Index (LAI) was calculated with LTR_retriever (v.2.9.0) (Ou et al., 2018).
2.4. Genome prediction and annotationNonredundant LTR-RT libraries were generated using LTRharvest (v.1.6.2) (Ellinghaus et al., 2008), LTR_Finder (v.1.07) (Xu and Wang, 2007) and LTR_retriever (v.2.9.0) (Ou and Jiang, 2018) with default parameters. MITEs were annotated using MITE-Hunter (v.1.0.0) (Han and Wessler, 2010). An integrated TE annotation pipeline combining de novo predictions from RepeatModeler (v.2.0.2a), LTR_retriever, and MITE-Hunter was implemented. The genome-wide repetitive element content within the consolidated TE library was characterized via RepeatMasker (v.4.1.2) (Tarailo-Graovac and Chen, 2009).
Protein-coding gene annotation employed homology-based, transcriptome-based, and ab initio approaches. Homology-based predictions were generated by aligning the Camellia sinensis and C. sinensis var. sinensis proteomes to the C. rubituberculata genome using MMseqs2 (Steinegger and Söding, 2017), followed by gene structure annotation of homologous loci using GeMoMa (v.1.6.1) (Keilwagen et al., 2016). The RNA-seq reads were aligned to the reference genome using HISAT2 (v.2.2.1) (Kim et al., 2015), and transcript abundances were quantified by StringTie (v.2.2.1) (Pertea et al., 2015). Spliced transcripts were mapped to the genome using PASA (v.2.5.2) to improve the gene models (Haas et al., 2003). Augustus (v.3.2.2) (Stanke et al., 2004), GlimmerHMM (v.3.0.4c) (Majoros et al., 2004) and GeneMark-ET (v.1) (Besemer and Borodovsky, 2005) were employed concurrently for ab initio prediction. PASA-H-set models were trained using Augustus (v.3.2.2) and GlimmerHMM (v.3.0.4c). EvidenceModeler (v.1.1.1) (Haas et al., 2008) was used to produce a consensus gene set. Finally, alternative splicing events and untranslated regions were annotated using the PASA-up data and SQLite.
The functional annotation of the predicted proteins was performed by searching against the NR, Swiss-Prot, and eggNOG databases using Diamond (v.2.0.13) (Buchfink et al., 2021). GO and KEGG annotations were performed via KOBAS (v.3.0) (Bu et al., 2021). Transfer RNAs (tRNAs) were identified using tRNAscan-SE (v.2.0.9) (Chan et al., 2021). Ribosomal RNAs (rRNAs) and other noncoding RNAs (ncRNAs) were annotated with searches against the Rfam database (v.14.6) (http://rfam.xfam.org/) using Infernal (v.1.1.4) (Kalvari et al., 2018).
2.5. Comparative genome analysisThe phylogenetic relationships and evolutionary trajectories of Camellia rubituberculata were investigated through comparative genomic analysis with 14 other angiosperms, including published genomes of six Camellia species from non-karstic habitats and eight representative specialized angiosperms (Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera, Citrus sinensis, Camellia sinensis, C. sinensis var. sinensis, C. oleifera, C. granthamiana, C. chekiangoleosa, C. lanceoleosa, Actinidia chinensis, Amborella trichopoda, Diospyros kaki, and Oryza sativa). The protein-coding gene sets were processed to retain the longest isoform per gene and filter out proteins shorter than 50 amino acids. Homologies were identified via an all-against-all BLASTP search (E ≤ 1e-7, coverage ≥30%) (Camacho et al., 2009). The BLAST hits for different proteins were merged using Solar (Yu et al., 2006). Orthologous groups were delineated from these homologs using OrthoMCL (I = 1.5) (Li et al., 2003). A Venn diagram was created to show shared orthologous groups (duplicated genes were counted as one) among C. rubituberculata, C. sinensis var. sinensis, C. oleifera, C. granthamiana, C. chekiangoleosa, C. sinensis, and C. lanceoleosa.
Single-copy orthologs were aligned via MUSCLE (Edgar, 2004). A concatenated superalignment matrix was constructed for phylogenetic inference. The maximum likelihood (ML) phylogeny was reconstructed using RAxML (v.8.2.12) under the PROTGAMMAAUTO model, with branch support assessed from 100 bootstrap replicates (Stamatakis, 2014). A Bayesian inference (BI) tree was constructed with the Jones + I + G + F model (2 runs, 100000 gens; 25% burn-in) using MrBayes (v.3.2.7a) (Ronquist et al., 2012). Divergence times were estimated using MCMCTree (PAML) (burn-in = 10000; samples = 100000; freq = 2) (Yang, 2007), calibrated with five fossil time points obtained from the TimeTree database: Populus trichocarpa/Camellia sinensis (108 million years ago (Mya), CI: 102.0–112.5 Mya), Amborella trichopoda/Oryza sativa (196 Mya, CI: 179.9–205.0 Mya), Diospyros kaki/O. sativa (160 Mya, CI: 142.1–163.5 Mya), Camellia oleifera/C. sinensis (14.5 Mya, CI: 12.5–16.4 Mya) and D. kaki/Actinidia chinensis (92 Mya, CI: 85.6–104.9 Mya) (Hedges et al., 2006). Ka/Ks ratios for orthologs were calculated using KaKs_Calculator (v.2.0), with Vitis vinifera used as the reference, on the basis of MUSCLE protein alignments and corresponding codon alignments generated by PAL2NAL (Zhang, 2022). Synteny analysis within the C. rubituberculata genome and microsynteny comparisons of C. rubituberculata, A. chinensis, and V. vinifera were performed using MCScanX (Wang et al., 2012). The collinearity of C. rubituberculata, C. sinensis var. sinensis, and C. oleifera was assessed using JCVI (Tang et al., 2008).
Gene family evolution was modeled using CAFÉ under an ML birth–death process to infer ancestral gene family sizes and identify significant expansions/contractions (family-wide/Viterbi p < 0.05) (De Bie et al., 2006). Tandemly duplicated genes in Camellia rubituberculata were identified by screening for collinear paralogs using MCScanX and BLASTP (Camacho et al., 2009). Positively selected genes were identified using branch-site likelihood ratio tests (LTRs) in PAML, with C. rubituberculata designated the foreground branch and the other six Camellia species designated the background branch (Yang et al., 2007). This test compares nested codon substitution models: (1) the null model (ω fixed at 1.0: fix_omega = 1, omega = 1) is constrained to sites under purifying selection (ω < 1) or neutral selective pressure (ω = 1); (2) the alternative model (ω estimated: fix_omega = 0, initial omega = 1.5) allows sites under positive selection (ω > 1) on the foreground branch. Maximum log-likelihoods were computed for both models. Statistical significance was assessed via χ2 tests of twice the log-likelihood difference (LRT), with false discovery rate (FDR) correction for multiple testing.
2.6. Population resequencing and variation detectionGenomic DNA was isolated from population samples using the CTAB method and sheared into ~350 bp fragments using focused ultrasonication (Covaris). The sequencing libraries were prepared using the NEBNext Ultra Ⅱ DNA Library Prep Kit and sequenced (PE150) on the Illumina NovaSeq platform. High-quality PE reads were aligned to the Camellia rubituberculata genome using BWA (Li and Durbin, 2009). SNPs were called using the Bayesian model implemented in SAMtools (Li et al., 2009). Initial SNP calls were filtered using VCFtools (v.0.1.16) to retain high-quality SNPs with the following three parameters: minimum depth per site (--minDP) ≥ 3, maximum missing data rate (--max-missing) ≤ 0.1, and minor allele frequency (--maf) ≥ 0.05 (Danecek et al., 2011; Ma et al., 2021). Filtered SNPs were annotated using ANNOVAR (Wang et al., 2010).
2.7. Population structure and genetic diversity analysisPopulation genetic analyses were conducted using the filtered SNP dataset (44,769,635 SNPs). A genetic distance matrix was constructed using TreeBeST (v.1.9.2), and a neighbor-joining (NJ) phylogenetic tree was constructed (Li et al., 2006). Principal component analysis (PCA) was performed using GCTA, and the first 2 principal components (PCs) were visualized (Yang et al., 2011). The population structure was inferred using ADMIXTURE (v.1.23) with a dataset comprising 44,769,635 genome-wide SNPs. No specific filtering for intergenic or near-neutral SNPs was applied, because the primary goal was to capture overall population structure from a genome-wide perspective. K values ranged from 2 to 8, and the analysis was repeated five times per K with cross-validation (Alexander et al., 2009; Yang et al., 2022). The optimal K was determined based on the cross-validation error. Population differentiation was evaluated by calculating the Fst value in 100-kb nonoverlapping sliding windows using VCFtools (v.0.1.16) (Danecek et al., 2011). Nucleotide diversity (π) was estimated in 100-kb sliding windows by VCFtools (v.0.1.16). Linkage disequilibrium (LD), measured as r2, was calculated for SNP pairs within a 500 kb window using PopLDdecay (v.3.4.0) (Zhang et al., 2019a). SMC++ (v.1.15.4) was employed to infer the history of the effective population size (Ne) of C. rubituberculata (Li and Durbin, 2011; Fumagalli et al., 2013).
2.8. Identification of selection sweep signaturesTo detect genomic regions that might be under selection in response to karst habitat adaptation, we performed a selective sweep analysis comparing the two wild populations (LT, SEC) from karst habitats to a cultivated population (LKY) from a non-karst environment as a reference. Genetic diversity parameters (Fst and π) were calculated, and regions within the top 5% of both Fst values and ln(π_WILD/π_LKY) values were considered putative selective sweep regions. Genes located within these overlapping regions were identified as candidate selected genes. Functional enrichment analysis of these candidate genes for KEGG pathways and GO terms was performed. To address potential false positives in the Fst/π-based selection scans (e.g., confounding effects from genetic isolation), we performed complementary XP-CLR analysis (Chen et al., 2010) with 100-kb sliding windows (step 10 kb). Protein–protein interaction (PPI) networks for candidate genes were predicted using the STRING database (v.11.5) (Szklarczyk et al., 2019).
2.9. RNA sequencing and analysisTissue-specific transcriptome analysis was conducted by constructing fifteen strand-specific cDNA libraries. Polyadenylated mRNA was enriched, fragmented, reverse-transcribed, and prepared into Illumina-compatible libraries using the NEBNext Ultra Directional RNA Library Prep Kit. After Illumina HiSeq 4000 paired-end sequencing (2 × 150 bp), gene expression was quantified as fragments per kilobase million (FPKM; Trapnell et al., 2010). The RNA-seq reads were aligned to the reference genome via HISAT2 (v.2.2.1). Differential gene expression analysis was performed using edgeR (v.3.18.1), which employs TMM normalization and quasi-likelihood F-tests (FDR < 0.05) (Robinson et al., 2010).
3. Results 3.1. Chromosome-scale genome assembly and annotation of Camellia rubituberculataA preassembly genomic survey involving k-mer frequency distribution analysis (k = 17) of 154.5 gigabases (Gb) of raw Illumina short reads revealed a genome size of ~2.56 Gb with a heterozygosity rate of 1.21% and 80.56% repetitive content, establishing baseline metrics for de novo genome construction (Fig. S1 and Table S1). The genome construction integrated 112.04 Gb of PacBio HiFi long reads (45 × coverage) and 278.74 Gb of Hi-C paired-end reads (111 × coverage) (Table S2). The HiFi-based primary assembly produced a genome of 2.51 Gb (contig N50 = 9.52 Mb), demonstrating exceptional continuity (Tables S3 and S4). Chromosome-scale scaffolding via ALLHiC generated 15 pseudomolecules (2.50 Gb final assembly, scaffold N50 = 168.34 megabases (Mb) with 97.5% chromatin anchoring efficiency, spanning 123.24–232.41 Mb per chromosome (Fig. 2A; Tables 1 and S5–S7). Benchmarking Universal Single-Copy Orthologous (BUSCO) analysis revealed that the genome covered 1500 of 1614 complete single-copy orthologous genes (92.9% coverage). Among these, 1275 genes (79.0%) were present in single-copy form, and 181 genes (13.9%) were present in multiple copies (Tables 1 and S8). The genome assembly demonstrated moderate quality in terms of the integrity and continuity of long terminal repeat (LTR) regions, as evaluated by the LTR assembly index (LAI = 16.71), indicating its suitability for downstream evolutionary or functional analyses of LTR retrotransposons (Table 1). These integrated benchmarking analyses validated the assembly quality of the C. rubituberculata genome as meeting reference-grade standards.
|
| Fig. 2 Heatmap of chromosome interactions and the Camellia rubituberculata genome. (A) Heatmap showing the interaction frequencies among 15 pseudo-chromosomes. Deeper color indicates a higher frequency of interaction. (B) Landscape of the C. rubituberculata genome structure (circular representation of the genome characteristics), comprising 15 pseudo-chromosomes. a: Pseudo-chromosomes of C. rubituberculata; b: GC density; c: repetitive element density; d: model genes; e: syntenic blocks. |
| Assembly feature | Statistic |
| Genome size (bp) | 2,503,244,753 |
| Number of N50 contigs | 44 |
| N50 contigs (bp) | 8,904,318 |
| Number of N50 scaffolds | 7 |
| N50 scaffolds (bp) | 168,340,502 |
| Chromosome-scale scaffolds (bp) | 2,440,767,607 |
| Sequences anchored to chromosomes (%) | 97.50 |
| Complete BUSCOs | 1500 (92.9 %) |
| GC content of the genome (%) | 38.42 |
| Number of genes | 55,302 |
| Average gene length (bp) | 6638 |
| Average CDS length (bp) | 1390 |
| Average exon length (bp) | 330 |
| Average transcript length (bp) | 1506 |
| Average exons per transcript | 4.6 |
| Repeat sequences (bp) | 1,406,371,433 |
| Percentage of repeat sequences (%) | 56.18 |
| LTR assembly index | 16.71 |
The Camellia rubituberculata genome harbors 1.41 Gb of repetitive elements (56% genome-wide), with LTR retrotransposons dominating the repeat landscape (51.27%). These retroelements are principally classified into the Gypsy (37.21%) and Copia (5.05%) superfamilies, which represent canonical LTR evolutionary lineages (Fig. 2B; Tables 1 and S9). The C. rubituberculata genome encodes 55,302 protein-coding loci, with transcript models averaging 6638 bp in length and 4.6 exons per transcript (Table 1 and S10). These predicted genes were cross-referenced with functional annotations in five databases, and more than 50,266 of the genes (90.89%) matched orthologs in public protein databases (Fig. S2 and Table S11). Noncoding RNAs, including tRNAs, rRNAs, miRNAs, and snRNAs, were also annotated (Table S12).
3.2. Genome evolutionPhylogenomic analyses revealed 44,320 gene families, with 344 single-copy orthologs. On the basis of these results, genome-level phylogenetic trees were constructed using Bayesian and maximum likelihood methods, with Amborella trichopoda as the outgroup, and divergence times were estimated using these results. The two trees were topologically consistent (bootstrap > 90%, posterior probability > 0.95) at key branch nodes, and both demonstrated that Camellia spp. clustered in one branch and were closer to Actinidia chinensis and Diospyros kaki (Figs. 3A, S3 and S4). To explore the WGD experienced by C. rubituberculata, the density distributions of Ks for orthologous gene pairs and paralogous gene pairs between C. rubituberculata and A. chinensis, C. oleifera, and Vitis vinifera were compared. The results revealed two distinct peaks in the density distribution of Ks, suggesting that C. rubituberculata experienced two WGDs. The first peak, at ~1.3–1.5 in the genomes of C. rubituberculata, A. chinensis, C. oleifera, and Vitis vinifera (Fig. 3B), represents the ancient γ-WGT event shared by all core eudicots. The estimated divergence time was ~115 Mya (Figs. 3A and S5). The second peak, at ~0.3–0.5 in the genomes of C. rubituberculata, A. chinensis and C. oleifera (Fig. 3B), indicates that these species shared a common WGD, with an estimated divergence time of ~86 Mya (Figs. 3A and S5). Visualization of microsynteny revealed approximately two copies of each syntenic block from V. vinifera in C. rubituberculata and A. chinensis (Fig. 3E), suggesting that C. rubituberculata and A. chinensis underwent one round of WGD after the γ-WGT event. This finding is consistent with the density distribution of Ks and the phylogenetic results. The divergence time calibration indicated that C. rubituberculata diverged from Camellia spp. before 14.9 Mya, earlier than other Camellia species (Figs. 3A and S5).
|
| Fig. 3 Evolutionary and collinearity analysis of the Camellia rubituberculata genome. (A) Left: Maximum-likelihood phylogenetic tree, divergence time, and gene family expansion/contraction tree for 15 angiosperms. All nodes exhibit ≥ 95% bootstrap support. The black circles denote fossil-calibrated divergence time nodes. Red (green) numerals quantify branch-specific gene family expansions (contractions), respectively. Right: Orthologous gene families clustered across 15 angiosperms. Gray: 4532 angiosperm-shared orthologs; gray + blue: 5271 eudicot-shared orthologs; gray + blue + green: 6188 Ericales-shared orthologs; gray + blue + green + orange: 8479 Camellia-shared orthologs; white: the remaining genes; black: species-specific multicopy genes. (B) Upper right: Ks for paralogous gene pairs in C. rubituberculata and Actinidia chinensis, C. oleifera, and Vitis vinifera. Bottom left: Ks for orthologous gene pairs between C. rubituberculata and A. chinensis, C. oleifera, and V. vinifera. (C) Interspecific collinearity across C. rubituberculata (Crub), C. oleifera (Cole), and C. sinensis var. sinensis (Csvs). The numbers indicate the pseudo-chromosomes for the three species. (D) KEGG enrichment analysis of 404 expanded gene families. (E) Microsynteny among C. rubituberculata, A. chinensis and V. vinifera. |
To gain a deeper understanding of the evolution of Camellia rubituberculata, the collinearity among C. rubituberculata (Crub), C. oleifera (Cole), and C. sinensis var. sinensis (LJ43) (Csvs) was examined. The results revealed strong collinearity, with chromosomes corresponding to each other in most regions (Figs. 3C and S6). This finding indicates the absence of large-scale structural variation among the species after Camellia divergence, despite the different morphological and physiological characteristics of the chromosomes.
The most recent common ancestor (MRCA) included 44,311 gene families. A comparison of Camellia rubituberculata with the MRCA revealed expansion in 404 gene families (5522 genes) and contraction in 131 gene families (420 genes) in C. rubituberculata (Figs. 3A and S7). KEGG enrichment analysis of the 404 expanded gene families revealed "photosynthesis" (ko00195) and "spliceosome" (ko03040) as significantly enriched (p < 0.05) (Fig. 3D and Table S13), suggesting that the rapid increase in genes involved in photosynthesis and spliceosome function may have facilitated C. rubituberculata adaptation to karst habitats.
3.3. Specific tandem duplication and positive selection within the Camellia rubituberculata genomeTo investigate the specific genetic changes in Camellia rubituberculata that contribute to adaptations to karst habitats, the gene family variations of seven Camellia species, namely, C. rubituberculata, C. oleifera, C. lanceoleosa, C. granthamiana, C. chekiangoleosa, C. sinensis var. sinensis (LJ43), and C. sinensis (TGY), were compared. A total of 1073 gene families specific to C. rubituberculata were identified (Fig. S8). KEGG enrichment analysis revealed that the spliceosome (ko03040) and arginine and proline metabolism (ko00330) pathways were among the 20 significantly enriched pathways (p < 0.05) (Fig. S9 and Table S14). Specific genes within these enriched pathways, particularly those involved in alternative splicing regulation (spliceosome) and stress-responsive metabolism (arginine and proline), suggest that the genomic basis for the adaptation of C. rubituberculata to habitat heterogeneity may involve increased posttranscriptional regulation in response to environmental stresses, such as drought, and metabolic adjustments for growth under stress.
A total of 3493 tandem duplicated genes were identified in Camellia rubituberculata. KEGG enrichment analysis revealed that these genes were enriched primarily in functions related to homologous recombination (ko03440), plant–pathogen interaction (ko04626), pyrimidine metabolism (ko00240), galactose metabolism (ko00052), ABC transporters (ko02010), diterpenoid biosynthesis (ko00904), and glycerophospholipid metabolism (ko00564) (Fig. S10 and Table S15). Broadly, these functions are associated with DNA repair, plant–environment interactions, and hormone synthesis and metabolism. Subsequently, 493 candidate genes under positive selection (q value < 0.05) were identified (Table S16). KEGG enrichment analysis revealed significant enrichment in metabolic (ko01100) and glycosylphosphatidylinositol (GPI)-anchor biosynthesis (ko00563) pathways (Fig. S11 and Table S17). These findings indicate that the genes under positive selection in C. rubituberculata are involved in processes such as plant metabolism and biosynthesis. These results suggest that the gene families of C. rubituberculata participate in plant–environment interactions, helping the plant cope with stresses by regulating its own metabolism, permitting a geographic distribution that differs from those of other Camellia species.
3.4. Genetic diversity and population structure of Camellia rubituberculataA total of 77 Camellia rubituberculata individuals from three populations (LT: 30 samples, SEC: 30 samples, and LKY: 17 samples) were sampled for whole-genome resequencing (Fig. 1A and Table S18). The average mapping rate was 98.94%, and the average depth was 11.1× (Table S19). A total of 143,083,498 raw single-nucleotide polymorphisms (SNPs) were obtained, which were further screened and filtered to yield 44,769,635 high-quality SNPs. These included 37,776,916 intergenic SNPs, 680,962 nonsynonymous SNPs, 434,662 synonymous SNPs, and 4,127,295 intronic SNPs, which were subsequently used for the calculation of genetic diversity parameters and the identification of candidate genes on the basis of the results of the calculations (Table S20). The SNPs from the three populations were subjected to principal component analysis (PCA) and neighbor-joining evolutionary tree analysis, which divided the population into two groups: S (SEC) and L (LT). The LKY population clustered with the SEC population, indicating a relatively close kinship relationship (Fig. 4A and B). These results were further supported by phylogenetic and population structure analyses of the three populations (Figs. 4C and S12). The population differentiation coefficient (Fst) values were low or moderate (0.012–0.092) and exhibited high variation among the three populations (Fig. 4D and Table S21). The SEC population presented the highest nucleotide diversity (π) (π = 4.886 × 10−3), followed by the LKY population (π = 4.827 × 10−3) and the LT population (π = 4.317 × 10−3). Overall, the genetic diversity of the C. rubituberculata populations was 4.67 × 10−3 (Fig. 4D and Table S21). Linkage disequilibrium (LD) analyses revealed that the LD declined fastest for r2 = 0.15. The SEC population exhibited the fastest LD decline (23.3 kb), followed by the LKY population (76.4 kb), with the LT population (76.7 kb) exhibiting the slowest decline (Fig. 4E and Table S21).
|
| Fig. 4 Population structure and genetic diversity of Camellia rubituberculata. (A) PCA plot of the first three eigenvectors. (B) Phylogenetic tree of 77 individuals. (C) Population genetic structure of 77 individuals from three populations. (D) π and Fst among the three populations. (E) LD decay for the three populations. (F) Demographic history of the three populations. LG: last glaciation (11–70 Kya), PG: penultimate glaciation (130–300 Kya), and NG: Naynayxungla glaciation (500–780 Kya). |
The effective population size (Ne) of Camellia rubituberculata from millions of years ago to the past 100 years was computed, demonstrating that the three populations (LT, SEC, and LKY) followed consistent population trajectories (Fig. 4F). The population history dynamics model showed that the two wild populations (LT and SEC) underwent a sharp contraction from ~600 thousand years ago (Kya) to ~200 Kya (Naynayxungla glaciation (NG) period), followed by a slow decline to a minimum from ~200 Kya to ~10 Kya (penultimate glaciation (PG)-last glaciation (LG) periods). The populations began to gradually increase during the glacial period, attaining a maximum Ne value ~5000 years ago. However, the population declined sharply over time thereafter, reaching the current population size ~1000 years ago.
3.5. Genome-wide selective sweep signals among the three groupsThe selective scanning results, in combination with the Fst and π ratios of the top 5%, revealed differential gene selection on populations in different environments. A total of 1754 candidate environmentally selected genes were identified in the LT population (LKY vs. LT), and 1036 candidate genes were identified in the SEC population (LKY vs. SEC) (Fig. 4A and C; Tables S22 and S25). A total of 76 genes were present in both groups (Fig. S13), and all these genes were annotated using multiple gene databases (Tables S23 and S26). To address potential confounding effects from demographic history, XP-CLR analysis was implemented as a complementary approach for detecting selective sweeps. A total of 11,389 candidate environmentally selected genes in LT populations (LKY vs. LT) we identified (Tables S28 and S29), and 11,391 candidate environmentally selected genes in SEC populations (LKY vs. SEC) (Tables S30 and S31). XP-CLR was validated for 42.9% (753/1754) of the Fst/π-selected genes and 33.3% (345/1036) of the SEC-selected genes in LT. The KEGG enrichment results revealed that both gene groups presented significantly similar functional pathway annotations, including annotations related to plant hormone signal transduction, purine metabolism, and secondary metabolite biosynthesis (Fig. 5B and D; Tables S24 and S27). Small auxin-upregulated RNA (SAUR) and brassinosteroid-signaling kinase (BSK) proteins, which are involved in plant hormone signal transduction, along with nucleoside diphosphate kinase (NDPK)1/2/3, which are involved in purine metabolism, were among the genes enriched in hormone signal transduction and metabolic processes. Loci identified as positively selected through genome-wide selective sweep scanning were functionally implicated in mediating the adaptive evolution of Camellia rubituberculata to karst heterogeneity, particularly through allelic differentiation at key regulatory nodes. Furthermore, PPI network topology analysis was performed to delineate hub regulatory modules under divergent environmental selection pressures. These findings revealed that MYB transcription factors, which are associated with the response to abiotic stresses, are also subject to positive selection of emergent protein–protein interactions (Fig. S14).
|
| Fig. 5 Population genetic differences and signatures of selective sweeps. (A) Fst and π-based analysis of positive gene selection in the Longtou (LT) and Guizhou Academy of Forestry (LKY) populations. (B) KEGG enrichment analysis of selected genes in the Longtou (LT) and LKY populations. (C) Fst and π-based analysis of positive gene selection in the Shi'e'chang (SEC) and LKY populations. (D) KEGG enrichment analysis of selected genes in the SEC and LKY populations. |
To ensure robust detection of differentially expressed genes (DEGs), high-depth RNA sequencing was performed on the leaf transcriptomes from each population. Following stringent quality control and adapter trimming, the final clean data yields reached 34.50 Gb (LT), 32.50 Gb (SEC), and 32.90 Gb (LKY), with all samples exhibiting Q30 scores of > 90% and unambiguous mapping rates to the reference genome (Table S32). This high-coverage sequencing (> 30 Gb per population) provided statistically powered resolution for quantitative gene expression analysis. A comparison of the SEC and LT populations with the LKY population identified 2422 DEGs (Fig. S15). The Gene Ontology (GO) terms that were enriched among these DEGs included a substantial number of terms associated with calcium ion homeostasis (Figs. S16 and S17). For instance, the term "calcium ion binding" (GO: 0005509) was detected, which is crucial as calcium ion binding plays a fundamental role in various biological processes. Another important term was "intracellular calcium ion homeostasis" (GO: 0006874), which is vital for maintaining the proper calcium ion levels within cells. "Calcium ion homeostasis" (GO: 0055074) was also among the identified terms, highlighting the overall significance of calcium ion balance in the biological system. Moreover, "calcium ion transmembrane transport" (GO: 0070588), which is essential for the movement of calcium ions across cell membranes and is closely related to many physiological functions, was identified (Fig. 6A and B).
|
| Fig. 6 Transcriptome analysis of three populations of C. rubituberculata. (A): GO enrichment analysis of DEGs in LKY vs. LT. (B): GO enrichment analysis of DEGs in LKY vs. SEC. (C): KEGG enrichment analysis of DEGs in LKY vs. LT. (D): KEGG enrichment analysis of DEGs in LKY vs. SEC. (E): Expression differences in CDPKF, NCL, SAU22 and SAU23 in leaves. Green represents individuals cultivated at Guizhou Forestry College (LKY), orange represents wild individuals from Longtou village (LT), and purple represents wild individuals from Shi'e'chang village (SEC). (*: 0.01 ≤ P ≤ 0.05; **: 0.001 ≤ P ≤ 0.01; ***: P ≤ 0.001). |
The KEGG enrichment results revealed specific pathways related to important biological functions, particularly photosynthesis and plant hormone signal transduction. Specifically, the ko00196, "photosynthesis-antenna proteins" pathway was identified. Photosynthesis is a fundamental process in plants, and antenna proteins play a key role in capturing light energy during photosynthesis. Another relevant pathway was ko04075, "plant hormone signal transduction". Plant hormones are essential for regulating plant growth, development, and responses to environmental stimuli, and the signal transduction of these hormones is a complex and crucial process (Fig. 6C and D). Which has unique environmental characteristics, such as nutrient-poor soil and high rock exposure. The expression of many genes associated with calcium ion homeostasis, photosynthesis and plant hormone signal transduction was significantly altered in the karst-growing Camellia rubituberculata. The expression of multiple auxin-responsive proteins also significantly changed. These alterations may help C. rubituberculata better cope with these challenging environmental factors, thus enhancing its suitability for the karst habitat. For example, the expression of genes related to photosystem I, photosystem Ⅱ, NCL, auxin-induced and calcium-dependent genes, such as PSAF, PSBR, SAUR22, CDPKF, and NCL, was significantly downregulated in the karst populations (Figs. 6E and S16). These changes were likely affected by the karst environment and helped these populations to adapt to their challenging environment.
4. DiscussionThis study provides the first report of a chromosome-level genome assembly of Camellia rubituberculata, a critically endangered species endemic to Guizhou Province. The genome was scaffolded into 15 pseudo-chromosomes (2.50 Gb), representing 97.5% of the assembled sequences, with a contig N50 = 9.52 Mb and scaffold N50 = 168.34 Mb. Assembly quality metrics demonstrated exceptional completeness (92.9% complete BUSCO) and structural integrity (LAI = 16.71), surpassing the recommended thresholds for reference-grade genomes (LAI > 15; BUSCO > 90%). This micro endemic species is restricted to only two known wild populations in the karst mountains of Qianxi'nan Prefecture. Its extremely small population size and fragile habitat make in situ conservation and genetic resource preservation an urgent priority. However, the molecular-level understanding of this species remains limited to preliminary plastid genome studies, hindering deeper investigations into its population genetics, adaptive potential, and conservation strategies. The construction of this high-quality, chromosome-level reference genome addresses a critical gap in the core genetic resources for C. rubituberculata. This study provides indispensable foundational data to elucidate the mechanisms of this species' endangerment, reveal the genetic basis of karst adaptation, and guide effective conservation. This chromosome-level assembly also provides a robust foundation for comparative phylogenomic analyses and functional characterization of karst-adaptive loci.
Genomic studies have demonstrated that WGD, gene family expansion, and positive selection processes are pivotal in promoting species adaptation to extreme environments (Feng et al., 2020; Wu et al., 2020; Xiang et al., 2024). Feng et al. (2020) demonstrated that Primulina huaijiensis (Gesneriaceae), a karst-endemic angiosperm, underwent two WGD events during its evolutionary history, with changes in WRKY transcription factors underlying adaptation to karst-specific stresses (e.g., osmotic imbalance and drought). Marsdenia tenacissima, a medicinal species endemic to calcareous karst ecosystems in Southwest China, exhibits calcium-responsive transcriptomic signatures associated with edaphic adaptation. Further, Zhou et al. (2023) deciphered the genomic basis of the limestone habitat specialization of M. tenacissima through genome-wide studies, revealing ion-transport regulators with key roles under calcium-enriched substrate conditions. Cao et al. (2023) reported a chromosome-level genome analysis of Platycarya longipes and P. strobilacea in combination with resequencing of 207 individuals and identified two TPC1 paralogs linked to calcium adaptation. This calcium influx channel gene, previously reported to be under selection in other karst herbs, suggests convergent evolution of high-calcium stress responses. Wen et al. (2024) constructed a chromosome-scale genome assembly for Rhododendron bailiense, revealing expanded ANK and CAP gene families functionally enriched in calcium ion transport. Comparative genomic analyses revealed evolutionary innovations in calcium homeostasis regulation underlying its adaptation to calcium-enriched lithospheric stress. While genomic advances have elucidated the adaptations of some plants to karst ecosystems, the karst adaptations of Camellia species remain underexplored. WGD provides original genetic material during speciation and induces unique functions and traits in species (Hou et al., 2016). For example, a genomic study of the cave plant Primulina huaijiensis indicated that the amplification of the WRKY gene family, which was retained after a WGD, may be a significant factor contributing to its adaptation to karst habitats (Feng et al., 2020). A study on Trachypithecus leucocephalus revealed that all positively selected genes were associated with mineral ion binding, greatly contributing to the survival of limestone langurs in karst habitats (Que et al., 2021). In this study, we found that C. rubituberculata underwent two WGDs, which produced genetic variation and likely contributed to the evolution of specific physiological traits that laid the foundation for its adaptive capacity. Previous studies have shown that genes retained by the common ancestor of angiosperms after the γ-WGT event were associated with water acquisition and salt stress tolerance during the Cretaceous drought period (Alix et al., 2017; Chen et al., 2020). It is possible that some angiosperms can adapt to harsh habitats because of this genetic foundation provided by the ancient γ-WGT event (Shen et al., 2022a). As an angiosperm, C. rubituberculata also underwent this γ-WGT event, and its expanded gene families are significantly enriched in pathways related to plant growth, such as photosynthesis pathways. This suggests that the γ-WGT event equipped C. rubituberculata with abundant genetic material to manage environmental changes. Moreover, the functional annotation of the numerous specific and positively selected gene families in the genome of C. rubituberculata could explain its adaptation. The genomic features associated with post differentiation karst adaptation of C. rubituberculata facilitated adaptation to karstic habitats during environmental acclimatization, a trait that is not common among Camellia species. This is one of the key factors driving the specialization of sect. Tuberculata. C. rubituberculata experienced two WGD events, leading to significant gene retention and enrichment, most notably within the spliceosome pathway, among the pathways that are significantly enriched in its expanded and unique gene families. This enrichment suggests enhanced alternative splicing capacity, which we propose may have facilitated adaptive responses to karst drought stress by enabling refined regulation of key stress signaling genes (Yisilam et al., 2025). For example, HAB1 (a PP2C family member and core ABA pathway regulator) undergoes functional alternative splicing that modulates drought tolerance by influencing stomatal closure (Zhang et al., 2009; Raghavendra et al., 2010; Murata et al., 2015; Zhan et al., 2015; Shi et al., 2025). WGD-driven spliceosome expansion in C. rubituberculata could optimize HAB1 splicing dynamics, thereby potentially enhancing ABA-mediated stomatal control for survival in arid karst habitats.
Southwestern China's abundant wild tea tree populations can provide a wealth of genetic resources and may play a crucial role in enhancing the growth characteristics and adaptability of trees (Evans et al., 2015; Willi et al., 2020; Shen et al., 2022b; Li et al., 2024). However, as aforementioned, Camellia rubituberculata is currently sparsely distributed. Historical population dynamics clearly indicate that the species has faced significant challenges with population bottlenecks and balanced selection as a result of dramatic population declines over thousands of years due to various pressures and disturbances such as habitat fragmentation, climate change, pathogen loads, competitive pressures, and logging by local people. Therefore, addressing population expansion and environmental adaptation mechanisms is of the utmost importance for the conservation of C. rubituberculata. Through our investigations and results we found that C. rubituberculata is able to survive in karst habitats where Camellia species cannot, demonstrating its unique adaptive capacity. This regional survival under strong selective pressure highlights its remarkable evolutionary resilience. Although only two wild populations have been found, this is most likely due to a drastic population decline caused by anthropogenic disturbance rather than a rejection of the environment by the species. The results of population genetic analyses indicate low nucleotide diversity in LT populations, which may be a result of its limited habitat. As mentioned in the Introduction, one contributing factor to the small population size of C. rubituberculata may be the intense selection imposed by its genus-atypical specialization to karst ecosystems, which is distinct from the non-karst habitats occupied by most Camellia congeners. Understanding these adaptation mechanisms is thus critical for conservation. We identified the adaptive traits of C. rubituberculata through genome-level analysis. These genomic signatures align with empirical reports that karst habitats impose compound stress from prolonged drought and nutrient deficiency. The numerous aforementioned enriched functional pathways, such as phytohormone transduction, are significant because of the functional enrichment of genes in this species subjected to environmental selection. The changes in the expression of genes related to these terms and pathways may be one of the strategies that provide favorable conditions for C. rubituberculata to adapt to the karst habitat. Furthermore, many proteins related to the environmental stress response, including SAUR, BSK, NCL, CDPK and NDPK1/2/3, are involved in each pathway's response. CDPKs are widely expressed in all kinds of plant tissues (Hong et al., 1996; Ye et al., 2009; Monaghan et al., 2014; Simeunovic et al., 2016) and function in both Ca2+ sensing and response (Harper et al., 2004). Some CDPKs, such as calcium-dependent protein kinase 1-like, have been lost in C. rubituberculata, which may be the direct cause of the downregulation of CDPKF expression. Although the downward trend of this expression did not reach statistical significance in the samples from the SEC, a consistent decrease was observed. A similar situation occurs for sodium/calcium exchanger (NCL)-associated proteins. Cytosolic Ca2+ homeostasis governs key processes such as auxin transport, circadian regulation, photoperiodic flowering, and stress adaptation through conserved Ca2+-CBL-CIPK signaling modules (Sze et al., 2000; Sanders et al., 2002; Spalding and Harper, 2011). The activity of NCL is influenced by Ca2+ in the cytoplasm. Moreover, NCL can regulate Na+ and Ca2+ homeostasis in plants, and it also affects auxin signal transduction indirectly through the regulation of Ca2+ (Li et al., 2015). One notable genomic adaptation in C. rubituberculata is the loss of the gene encoding the sodium/calcium exchange protein NCL-like isoform X1, which is implicated in Ca2+ transport. This finding suggests that persistently high concentrations of dissolved calcium, a defining selective pressure in karst environments (Gu et al., 2017), may drive the directional streamlining of ion transporter gene families. Within this context, the loss of functionally redundant members, such as NCL-like isoform X1, could represent an adaptive trade-off, reducing the metabolic costs associated with maintaining nonessential pathways under strong directional selection (Kuo et al., 2009). We hypothesize that during prolonged exposure to exogenous calcium ions in karst ecosystems, the loss of the NCL-like gene was selectively fixed because its function was compensated for by other, potentially more efficient, calcium transport systems (e.g., PMCA4) (Wu et al., 2002). This discovery aligns with the broader principle of 'genome streamlining' in adaptation to extreme or stable environments, where gene loss, particularly the loss of genes with redundant functions, is favored to optimize resource allocation and energetic efficiency (Giovannoni et al., 2005). In support of this observation, comparative expression profiling revealed a downward trend in the expression of this specific protein in karst-derived samples. Auxin plays a pivotal role in plant growth and development (Casanova-Sáez et al., 2021). Members of the SAUR gene family, which is considered the largest family of auxin early response genes in plants, can interact with PP2C (Hagen and Guilfoyle, 2002; Spartz et al., 2014). SAUR proteins play crucial roles in plant growth, developmental processes, and abiotic stress responses. For example, in Arabidopsis thaliana, the mediation of ABA signal transduction by AtSAUR32 is vital for drought stress acclimatization (He et al., 2021), whereas in Triticum aestivum, the overexpression of TaSAUR75 leads to root lengthening and enhanced survival under salt and drought stresses (Guo et al., 2018). The enrichment of functional genes such as SAURs among environmentally selected genes in the two wild populations of C. rubituberculata may have significantly aided its adaptation to the arid karst habitats. More SAURs were found in C. rubituberculata than in their predecessors living in non-karst regions. Specifically, certain genes, such as the auxin-responsive protein SAUR23-like, auxin-responsive protein SAUR21-like, and auxin-induced protein 15A, have undergone duplication. Moreover, a comparison of the gene expression profiles of young leaves revealed downregulated expression of genes related to the auxin-responsive protein SAUR23-like, the auxin-responsive protein SAUR21-like, and auxin-induced protein 15A in the samples of C. rubituberculata living in the karst environment.
Population structure analysis revealed a strikingly close genetic relationship between the cultivated population (LKY) and the wild SEC population, closer than its relationship to the wild LT population. This finding provides strong genetic evidence regarding the likely provenance of cultivated Camellia rubituberculata individuals. The high genetic similarity strongly suggests that the cultivated individuals originated from, or share a very recent common ancestry with, the SEC wild population, rather than the LT population. This environmental adaptation was corroborated by genomic signatures of selective sweeps, which highlighted functional evolution in genes associated with plant signal transduction. Notably, significant alterations were observed in signaling pathways involving calcium-dependent protein kinases (CDPKs) and NCL-associated proteins. Interestingly, these changes in signal transduction are very similar to the genetic regulatory mechanisms observed during plant adaptation to karst environments, suggesting that plants may employ conserved genetic strategies to cope with diverse environmental pressures. This discovery not only elucidates key molecular mechanisms underlying plant adaptive evolution but also provides new insights into the evolutionary dynamics of species in heterogeneous habitats.
Additionally, MYB transcription factors subject to environmental selection in wild populations were identified. MYB transcription factors are essential for plant growth, aiding plants in adapting to biotic and abiotic stresses such as infection (Shen et al., 2017; Gu et al., 2020; Hawku et al., 2022), high salinity (Wang et al., 2021; Du et al., 2022), cold (Dong et al., 2021), and heat (Lv et al., 2021; Wu et al., 2021). The evolution of these genes at the population level may have facilitated Camellia rubituberculata adaptation to karst areas. In summary, C. rubituberculata has been subjected to environmental selection pressures during population evolution and has developed numerous mechanisms to adapt to harsh karst habitats. These mechanisms interact with each other to help C. rubituberculata tolerate the environmental challenges presented by the karst habitat during plant growth and development. Characterizing the karst adaptation mechanisms of C. rubituberculata is a significant milestone in the development and utilization of Camellia spp. These results provide important insights and a foundation for future molecular breeding studies and the development of C. rubituberculata and other Camellia species.
5. ConclusionsBy generating high-quality reference genome and population genetics data for Camellia rubituberculata, this work advances the characterization of genetic diversity among endemic wild tea tree resources in Southwest China's karst region. The factors contributing to the adaptation of C. rubituberculata in karst habitats have been preliminarily identified, providing a solid foundation for future research. The findings of this study could guide the subsequent screening of wild tea tree germplasm resources and research on the adaptive evolution of karst species.
AcknowledgementsThis work was supported by the National Natural Science Foundation of China (32400179, 31960043, 32360101), the National Key R&D Program of China (2024YFF1307400), and the 2024 Guizhou Science and Technology Innovation Talent Team Construction Project: Wildlife Innovation Team of the Forestry College of Guizhou University (Qian ke he ren cai CXTD [2025] 053).
Data availability statement
The genome sequences and assemblies of Camellia rubituberculata (Illumina reads, PacBio long reads and Hi-C reads) have been deposited in the National Genomics Data Center (https://ngdc.cncb.ac.cn) under BioProject PRJCA031800. The whole-genome resequencing data have been deposited in the National Genomics Data Center (https://ngdc.cncb.ac.cn) under BioProject PRJCA031802.
CRediT authorship contribution statement
Chao Yan: Writing - Original Draft, Data Curation, Investigation. Ming-tai An: Supervision. Ming Tang: Writing - Review & Editing. Xin-xiang Bai: Writing - Review & Editing. Xu Xiao: Visualization. Zhao-hui Ran: Investigation. Zhi Li: Conceptualization, Methodology, Project administration, Funding acquisition, Investigation.
Author's statement
All authors have read the final draft and agree to the publication, ensuring that all individuals deserving authorship have been appropriately acknowledged in the authorship contribution statement.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2025.09.004.
Alexander, D.H., Novembre, J., Lange, K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res., 19: 1655-1664. DOI:10.1101/gr.094052.109 |
Alix, K., Gérard, P.R., Schwarzacher, T., et al., 2017. Polyploidy and interspecific hybridization: partners for adaptation, speciation and evolution in plants. Ann. Bot., 120: 183-194. DOI:10.1093/aob/mcx079 |
Besemer, J., Borodovsky, M., 2005. GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res., 33: W451-W454. DOI:10.1093/nar/gki487 |
Bu, D.C., Luo, H.T., Huo, P.P., et al., 2021. KOBAS-i: intelligent prioritization and exploratory visualization of biological functions for gene enrichment analysis. Nucleic Acids Res., 49: W317-W325. DOI:10.1093/nar/gkab447 |
Buchfink, B., Reuter, K., Drost, H.G., 2021. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods, 18: 366-368. DOI:10.1038/s41592-021-01101-x |
Camacho, C., Coulouris, G., Avagyan, V., et al., 2009. BLAST+: architecture and applications. BMC Bioinformatics, 10: 421. DOI:10.1186/1471-2105-10-421 |
Cao, Y., Almeida-Silva, F., Zhang, W.P., et al., 2023. Genomic insights into adaptation to karst limestone and incipient speciation in East Asian Platycarya spp. (Juglandaceae). Mol. Biol. Evol., 40: msad121. DOI:10.1093/molbev/msad121 |
Casanova-Sáez, R., Mateo-Bonmatí, E., Ljung, K., 2021. Auxin metabolism in plants. Cold Spring Harb Perspect Biol., 13: a039867. DOI:10.1101/cshperspect.a039867 |
Chan, P.P., Lin, B.Y., Mak, A.J., et al., 2021. tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes. Nucleic Acids Res., 49: 9077-9096. DOI:10.1093/nar/gkab688 |
Chang, H.T., Ren, S.X., 1991. Classification on the section Tuberculata of Camellia. Acta Scientiarum Naturaliun Universitatis Sunyatseni, 30: 86-91. |
Chen, H., Patterson, N., Reich, D., 2010. Population differentiation as a test for selective sweeps. Genome Res., 20: 393-402. DOI:10.1101/gr.100545.109 |
Chen, J.D., Zheng, C., Ma, J.Q., et al., 2020. The chromosome-scale genome reveals the evolution and diversification after the recent tetraploidization event in tea plant. Hortic. Res., 7: 63. DOI:10.1038/s41438-020-0288-2 |
Chen, J.H., Hao, Z.D., Guang, X.M., et al., 2019. Liriodendron genome sheds light on angiosperm phylogeny and species-pair differentiation. Nat. Plants, 5: 18-25. DOI:10.1038/s41477-018-0323-6 |
Chen, Y.Q., Dong, L.N., Yi, H.Q., et al., 2024. Genomic divergence and mutation load in the Begonia masoniana complex from limestone karsts. Plant Divers., 46: 575-584. DOI:10.1016/j.pld.2024.04.001 |
Cheng, H.Y., Concepcion, G.T., Feng, X.W., et al., 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods, 18: 170-175. DOI:10.1038/s41592-020-01056-5 |
Clements, R., Sodhi, N.S., Schilthuizen, M., et al., 2006. Limestone karsts of Southeast Asia: imperiled arks of biodiversity. Bioscience, 56: 733-742. DOI:10.1641/0006-3568(2006)56[733:LKOSAI]2.0.CO;2 |
Cui, L.Y., Wall, K.P., Leebens-Mack, J.H., et al., 2006. Widespread genome duplications throughout the history of flowering plants. Genome Res., 16: 738-749. DOI:10.1101/gr.4825606 |
Dai, X.Y., Yuan, C.J., Lin, Z.X., et al., 2016. Study on the cuttings propagation technology of Camellia rubituberculata Chang. Seed, 35: 124-127. DOI:10.16590/j.cnki.1001-4705.2016.08.124 |
Danecek, P., Auton, A., Abecasis, G., et al., 2011. The variant call format and VCFtools. Bioinformatics, 27: 2156-2158. DOI:10.1093/bioinformatics/btr330 |
De Bie, T., Cristianini, N., Demuth, J.P., et al., 2006. CAFE: a computational tool for the study of gene family evolution. Bioinformatics, 22: 1269-1271. DOI:10.1093/bioinformatics/btl097 |
Deng, Z.Z., Pan, D.Q., Liu, Z.B., et al., 2011. Preliminary Study on Camellia rubituberculata Chang et Yu in Guizhou Province. Guizhou For. Sci. Tech., 39: 39-42. DOI:10.1007/s12404-011-0108-2 |
Dong, J., Cao, L., Zhang, X.Y., et al., 2021. An R2R3-MYB transcription factor RmMYB108 responds to chilling stress of Rosa multiflora and conferred cold tolerance of Arabidopsis. Front. Plant Sci., 12: 696919. DOI:10.3389/fpls.2021.696919 |
Doyle, J.J., Doyle, J.L., 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochem. Bull., 19: 11-15. DOI:10.1016/0031-9422(80)85004-7 |
Du, B.Y., Liu, H., Dong, K.T., et al., 2022. Over-expression of an R2R3 MYB gene, MdMYB108L, enhances tolerance to salt stress in transgenic plants. Int. J. Mol. Sci., 23: 9428. DOI:10.3390/ijms23169428 |
Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32: 1792-1797. DOI:10.1093/nar/gkh340 |
Ellegren, H., 2014. Genome sequencing and population genomics in non-model organisms. Trends Ecol. Evol., 29: 51-63. DOI:10.1016/j.tree.2013.09.008 |
Ellinghaus, D., Kurtz, S., Willhoeft, U., 2008. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics, 9: 1-14. DOI:10.1186/1471-2105-9-18 |
Evans, L.M., Slavov, G.T., Rodgers-Melnick, E., et al., 2014. Population genomics of Populus trichocarpa identifies signatures of selection and adaptive trait associations. Nat. Genet., 46: 1089-1096. DOI:10.1038/ng.3075 |
Evans, S.N., Hening, A., Schreiber, S.J., 2015. Protected polymorphisms and evolutionary stability of patch-selection strategies in stochastic environments. Math. Biol., 71: 325-359. DOI:10.1007/s00285-014-0824-5 |
Feng, C., Wang, J., Wu, L.Q., et al., 2020. The genome of a cave plant, Primulina huaijiensis, provides insights into adaptation to limestone karst habitats. New Phytol., 227: 1249-1263. DOI:10.1111/nph.16588 |
Fumagalli, M., Vieira, F.G., Korneliussen, T.S., et al., 2013. Quantifying population genetic differentiation from next-generation sequencing data. Genetics, 195: 979-992. DOI:10.1534/genetics.113.154740 |
Gao, Y., Ai, B., Kong, H.H., et al., 2015. Geographical pattern of isolation and diversification in karst habitat islands: a case study in the Primulina eburnea complex. J. Biogeogr., 42: 2131-2144. DOI:10.1111/jbi.12576 |
Giovannoni, S.J., Tripp, H.J., Givan, S., 2005. Genome streamlining in a cosmopolitan oceanic bacterium. Science, 309: 1242-1245. DOI:10.1126/science.1114057 |
Gong, W.F., Xiao, S.X., Wang, L.K., et al., 2022. Chromosome-level genome of Camellia lanceoleosa provides a valuable resource for understanding genome evolution and self-incompatibility. Plant J., 110: 881-898. DOI:10.1111/tpj.15739 |
Gu, K.D., Zhang, Q.Y., Yu, J.Q., et al., 2020. R2R3-MYB transcription factor MdMYB73 confers increased resistance to the fungal pathogen Botryosphaeria dothidea in apples via the salicylic acid pathway. J. Agric. Food Chem., 69: 447-458. DOI:10.1021/acs.jafc.0c06740 |
Gu, X., Ma, T., Wu, Y., et al., 2017. Hydrogeochemical characteristics of groundwater in the karst region, Southwest China. Procedia Earth Planet. Sci., 17: 245-248. DOI:10.1016/j.proeps.2016.12.045 |
Guan, D.F., McCarthy, S.A., Wood, J., et al., 2020. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics, 36: 2896-2898. DOI:10.1093/bioinformatics/btaa025 |
Guo, J., Luo, J., Zhou, Y., et al., 2024. Active components and skin care properties of tea seed oil from Camellia sinensis. Bioresources, 19: 7166-7182. DOI:10.15376/biores.19.4.7166-7182 |
Guo, Y., Jiang, Q.Y., Hu, Z., et al., 2018. Function of the auxin-responsive gene TaSAUR75 under salt and drought stress. Crop J., 6: 181-190. DOI:10.1016/j.cj.2017.08.005 |
Haas, B.J., Delcher, A.L., Mount, S.M., et al., 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res., 31: 5654-5666. DOI:10.1093/nar/gkg770 |
Haas, B.J., Salzberg, S.L., Zhu, W., et al., 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol., 9: 1-22. DOI:10.1186/gb-2008-9-1-r7 |
Hagen, G., Guilfoyle, T., 2002. Auxin-responsive gene expression: genes, promoters and regulatory factors. Plant Mol. Biol., 49: 373-385. DOI:10.1023/A:1015207114117 |
Han, Y.J., Wessler, S.R., 2010. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res., 38: e199. DOI:10.1093/nar/gkq862 |
Hao, Z., Kuang, Y.W., Kang, M., 2015. Untangling the influence of phylogeny, soil and climate on leaf element concentrations in a biodiversity hotspot. Funct. Ecol., 29: 165-176. DOI:10.1111/1365-2435.12344 |
Harper, J.F., Breton, G., Harmon, A., 2004. Decoding Ca2+ signals through plant protein kinases. Annu. Rev. Plant Biol., 55: 263-288. DOI:10.1146/annurev.arplant.55.031903.141627 |
Hawku, M.D., He, F.X., Bai, X.X., et al., 2022. A R2R3 MYB transcription factor, TaMYB391, is positively involved in wheat resistance to Puccinia striiformis f. sp. tritici. Int. J. Mol. Sci., 23: 14070. DOI:10.3390/ijms232214070 |
He, Y.J., Liu, Y., Li, M.Z., et al., 2021. The Arabidopsis SMALL AUXIN UP RNA32 protein regulates ABA-mediated responses to drought stress. Front. Plant Sci., 12: 625493. DOI:10.3389/fpls.2021.625493 |
Hedges, S.B., Dudley, J., Kumar, S., 2006. TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics, 22: 2971-2972. DOI:10.1093/bioinformatics/btl505 |
Hegarty, M.J., Hiscock, S.J., 2008. Genomic clues to the evolutionary success of polyploid plants. Curr. Biol., 18: R435-R444. DOI:10.1016/j.cub.2008.03.043 |
Hong, Y., Takano, M., Liu, C.M., et al., 1996. Expression of three members of the calcium-dependent protein kinase gene family in Arabidopsis thaliana. Plant Mol. Biol., 30: 1259-1275. DOI:10.1007/BF00019557 |
Hou, J., Ye, N., Dong, Z.Y., et al., 2016. Major chromosomal rearrangements distinguish willow and poplar after the ancestral "salicoid" genome duplication. Genome Biol. Evol., 8: 1868-1875. DOI:10.1093/gbe/evw127 |
Jiao, Y.N., Wickett, N.J., Ayyampalayam, S., et al., 2011. Ancestral polyploidy in seed plants and angiosperms. Nature, 473: 97-100. DOI:10.1038/nature09916 |
Kalvari, I., Argasinska, J., Quinones-Olvera, N., et al., 2018. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res., 46: D335-D342. DOI:10.1093/nar/gkx1038 |
Keilwagen, J., Wenk, M., Erickson, J.L., et al., 2016. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res., 44: e89. DOI:10.1093/nar/gkw092 |
Kim, D., Langmead, B., Salzberg, S.L., 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods, 12: 357-360. DOI:10.1038/nmeth.3317 |
Kuo, C.H., Moran, N.A., Ochman, H., 2009. The consequences of genetic drift for bacterial genome complexity. Genome Res., 19: 1450-1454. DOI:10.1101/gr.091785.109 |
Larguinho, M., Santos, H.M., Doria, G., et al., 2010. Development of a fast and efficient ultrasonic-based strategy for DNA fragmentation. Talanta, 81: 881-886. DOI:10.1016/j.talanta.2010.01.032 |
Li, H., Coghlan, A., Ruan, J., 2006. TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res., 34: D572-D580. DOI:10.1093/nar/gkj118 |
Li, H., Durbin, R., 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25: 1754-1760. DOI:10.1093/bioinformatics/btp324 |
Li, H., Durbin, R., 2011. Inference of human population history from individual whole-genome sequences. Nature, 475: 493-496. DOI:10.1038/nature10231 |
Li, H., Handsaker, B., Wysoker, A., et al., 2009. The Sequence alignment/map format and SAMtools. Bioinformatics, 25: 2078-2079. DOI:10.1093/bioinformatics/btp352 |
Li, L., Stoeckert Jr, C.J., Roos, D.S., 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res., 13: 2178-2189. DOI:10.1101/gr.1224503 |
Li, M.M., Meegahakumbura, M.K., Wambulwa, M.C., et al., 2024. Genetic analyses of ancient tea trees provide insights into the breeding history and dissemination of Chinese Assam tea (Camellia sinensis var. assamica). Plant Divers., 46: 229-237. DOI:10.1016/j.pld.2023.06.002 |
Li, P.H., Zhang, G.Y., Gonzales, N., et al., 2015. Ca2+-regulated and diurnal rhythm-regulated Na+/Ca2+ exchanger AtNCL affects flowering time and auxin signaling in Arabidopsis. Plant Cell Environ., 39: 377-392. DOI:10.1111/pce.12620 |
Lin, P., Wang, K.L., Wang, Y.P., et al., 2022. The genome of oil-Camellia and population genomics analysis provide insights into seed oil domestication. Genome Biol., 23: 14. DOI:10.1186/s13059-021-02599-2 |
Lu, Y.Y., Liang, H.M., Liao, J.L., et al., 2024. Chromosome-scale assembly and analysis of yellow Camellia (Camellia limonia) genome reveal plant adaptation mechanism and flavonoid biosynthesis in karst region. Glob. Ecol. Conserv., 56: e03296. DOI:10.1016/j.gecco.2024.e03296 |
Lv, K.W., Wei, H.R., Liu, G.F., 2021. A R2R3-MYB transcription factor gene, BpMYB123, regulates BpLEA14 to improve drought tolerance in Betula platyphylla. Front. Plant Sci., 12: 791390. DOI:10.3389/fpls.2021.791390 |
Ma, Z., Zhang, Y., Wu, L., et al., 2021. High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. Nat. Genet., 53: 1385-1391. DOI:10.1038/s41588-021-00910-2 |
Majoros, W.H., Pertea, M., Salzberg, S.L., 2004. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics, 20: 2878-2879. DOI:10.1093/bioinformatics/bth315 |
Manni, M., Berkeley, M.R., Seppey, M., et al., 2021. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol., 38: 4647-4654. DOI:10.1093/molbev/msab199 |
Marçais, G., Kingsford, C., 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27: 764-770. DOI:10.1093/bioinformatics/btr011 |
Min, T.L., 2007. Theaceae. In: Wu, Z.Y., Raven, P.H., Bartholomew, B. (Eds.), Flora of China. Science Press & Missouri Botanical Garden Press, Beijing & St. Louis, pp. 367–412.
|
Monaghan, J., Matschi, S., Shorinola, O., et al., 2014. The calcium-dependent protein kinase CPK28 buffers plant immunity and regulates BIK1 turn over. Cell Host Microbe, 16: 605-615. DOI:10.1016/j.chom.2014.10.007 |
Monro, A.K., Bystriakova, N., Fu, L.F., et al., 2018. Discovery of a diverse cave flora in China. PLoS One, 13: e0190801. DOI:10.1371/journal.pone.0190801 |
Murata, Y., Mori, I.C., Munemasa, S., 2015. Diverse stomatal signaling and the signal integration mechanism. Annu. Rev. Plant Biol., 66: 369-392. DOI:10.1146/annurev-arplant-043014-114707 |
Nie, Y.P., Chen, H.S., Wang, K.L., et al., 2011. Seasonal water use patterns of woody species growing on the continuous dolostone outcrops and nearby thin soils in subtropical China. Plant Soil, 34: 399-412. DOI:10.1007/s11104-010-0653-2 |
Ou, S.J., Chen, J.F., Jiang, N., 2018. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res., 46: e126. DOI:10.1093/nar/gky730 |
Ou, S.J., Jiang, N., 2018. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol., 176: 1410-1422. DOI:10.1104/pp.17.01310 |
Pertea, M., Pertea, G.M., Antonescu, C.M., et al., 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol., 33: 290-295. DOI:10.1038/nbt.3122 |
Que, T.C., Wang, H.F., Yang, W.F., et al., 2021. The reference genome and transcriptome of the limestone langur, Trachypithecus leucocephalus, reveal expansion of genes related to alkali tolerance. BMC Biology, 19: 67. DOI:10.1186/s12915-021-00998-2 |
Raghavendra, A.S., Gonugunta, V.K., Christmann, A., et al., 2010. ABA perception and signalling. Trends Plant Sci., 15: 395-401. DOI:10.1016/j.tplants.2010.04.006 |
Ran, Z.H., Li, Z., Xiao, X., et al., 2024. Extensive targeted metabolomics analysis reveals the identification of major metabolites, antioxidants, and disease-resistant active pharmaceutical components in Camellia tuberculata (Camellia L.) seeds. Sci. Rep., 14: 8709. DOI:10.1038/s41598-024-58725-0 |
Ranallo-Benavidez, T.R., Jaron, K.S., Schatz, M.C., 2020. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun., 11: 1432. DOI:10.1038/s41467-020-14998-3 |
Ren, R., Wang, H.F., Guo, C.C., et al., 2018. Widespread whole genome duplications contribute to genome complexity and species diversity in angiosperms. Mol. Plant, 11: 414-428. DOI:10.1016/j.molp.2018.01.002 |
Robinson, J.T., Turner, D., Durand, N.C., et al., 2018. Juicebox. js provides a cloud-based visualization system for Hi-C data. Cell Syst., 6: 256-258. DOI:10.1016/j.cels.2018.01.001 |
Robinson, M.D., McCarthy, D.J., Smyth, G.K., 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26: 139-140. DOI:10.1093/bioinformatics/btp616 |
Ronquist, F., Teslenko, M., van der Mark, P., et al., 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol., 61: 539-542. DOI:10.1093/sysbio/sys029 |
Sanders, D., Pelloux, J., Brownlee, C., et al., 2002. Calcium at the crossroads of signaling. Plant Cell, 14: S401-S417. DOI:10.1105/tpc.002899 |
Sang, Y.P., Long, Z.Q., Dan, X.M., et al., 2022. Genomic insights into local adaptation and future climate-induced vulnerability of a keystone forest tree in East Asia. Nat. Commun., 13: 6541. DOI:10.1038/s41467-022-34206-8 |
Shen, T.F., Huang, B., Xu, M., et al., 2022a. The reference genome of Camellia chekiangoleosa provides insights into Camellia evolution and tea oil biosynthesis. Hortic. Res., 9: uhab083. DOI:10.1093/hr/uhab083 |
Shen, X.J., Guo, X.W., Guo, X., et al., 2017. PacMYBA, a sweet cherry R2R3-MYB transcription factor, is a positive regulator of salt stress tolerance and pathogen resistance. Plant Physiol. Biochem., 112: 302-311. DOI:10.1016/j.plaphy.2017.01.015 |
Shen, Y.F., Xia, H., Tu, Z.H., et al., 2022b. Genetic divergence and local adaptation of Liriodendron driven by heterogeneous environments. Mol. Ecol., 31: 916-933. DOI:10.1111/mec.16271 |
Shi, M.M., Liang, P., Luo, Z.L., et al., 2025. Genome compaction underlies the molecular adaptation of bay cedar (Suriana maritima) to the extreme habitat on the tropical coral islands. Plant Divers., 47: 337-340. DOI:10.1016/j.pld.2025.01.002 |
Simeunovic, A., Mair, A., Wurzinger, B., et al., 2016. Know where your clients are: subcellular localization and targets of calcium-dependent protein kinases. J. Exp. Bot., 67: 3855-3872. DOI:10.1093/jxb/erw157 |
Soltis, D.E., Albert, V.A., Leebens-Mack, J., et al., 2009. Polyploidy and angiosperm diversification. Am. J. Bot., 96: 336-348. DOI:10.3732/ajb.0800079 |
Spalding, E.P., Harper, J.F., 2011. The ins and outs of cellular Ca2+ transport. Curr. Opin. Plant Biol., 14: 715-720. DOI:10.1016/j.pbi.2011.08.001 |
Spartz, A.K., Ren, H., Park, M.Y., et al., 2014. SAUR inhibition of PP2C-D phosphatases activates plasma membrane H+-ATPases to promote cell expansion in Arabidopsis. Plant Cell, 26: 2129-2142. DOI:10.1105/tpc.114.126037 |
Stamatakis, A., 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30: 1312-1313. DOI:10.1093/bioinformatics/btu033 |
Stanke, M., Steinkamp, R., Waack, S., et al., 2004. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Res., 32: W309-W312. DOI:10.1093/nar/gkh379 |
Steinegger, M., Söding, J., 2017. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol., 35: 1026-1028. DOI:10.1038/nbt.3988 |
Sze, H., Liang, F., Hwanng, I., et al., 2000. Diversity and regulation of plant Ca2+ pumps: insights from expression in yeast. Annu. Rev. Plant Physiol. Plant Mol. Biol., 51: 433-462. DOI:10.1146/annurev.arplant.51.1.433 |
Szklarczyk, D., Gable, A.L., Lyon, D., et al., 2019. STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res., 47: D607-D613. DOI:10.1093/nar/gky1131 |
Tang, H.B., Bowers, J.E., Wang, X.Y., et al., 2008. Synteny and collinearity in plant genomes. Science, 320: 486-488. DOI:10.1126/science.1153917 |
Tarailo-Graovac, M., Chen, N.S., 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protocol. Bioinformat., 25: 4.10.1-4.10.14. DOI:10.1002/0471250953.bi0410s25 |
Turanchik, E.J., Kane, T.C., 1979. Ecological genetics of the cave beetle Neaphaenops tellkampfii (Coleoptera: Carabidae). Oecologia, 44: 63-67. DOI:10.1007/BF00346399 |
Van de Peer, Y., Maere, S., Meyer, A., 2009. The evolutionary significance of ancient genome duplications. Nat. Rev. Genet., 10: 725-732. DOI:10.1038/nrg2600 |
Wallace, A.R., 1858. On the tendency of varieties to departinitely from the original type. Zoology, 3: 53-62. |
Wang, K., Li, M.Y., Hakonarson, H., 2010. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res., 38: e164. DOI:10.1093/nar/gkq603 |
Wang, S.S., Shi, M.Y., Zhang, Y., et al., 2021. FvMYB24, a strawberry R2R3-MYB transcription factor, improved salt stress tolerance in transgenic Arabidopsis. Biochem. Biophys. Res. Commun., 569: 93-99. DOI:10.1016/j.bbrc.2021.06.085 |
Wang, X.C., Feng, H., Chang, Y.X., et al., 2020. Population sequencing enhances understanding of tea plant evolution. Nat. Commun., 11: 4447. DOI:10.1038/s41467-020-18228-8 |
Wang, Y.P., Tang, H.B., Debarry, J.D., et al., 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res., 40: e49. DOI:10.1093/nar/gkr1293 |
Wen, S.L., Cai, X.W., Yang, K., et al., 2024. Chromosome-level genome assembly of a rare karst-growing Rhododendron species provides insights into its evolution and environmental adaptation. J. Syst. Evol., 63: 245-267. DOI:10.1111/jse.13130 |
Whitlock, M.C., Lotterhos, K.E., 2015. Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. Am. Nat., 186(S1): S24-S36. DOI:10.1086/682949 |
Willi, Y., Fracassetti, M., Bachmann, O., et al., 2020. Demographic processes linked to genetic diversity and positive selection across a species' range. Plant Commun., 1: 100111. DOI:10.1016/j.xplc.2020.100111 |
Wingett, S., Ewels, P., Furlan-Magaril, M., et al., 2015. HiCUP: pipeline for mapping and processing Hi-C data. F1000Research, 4: 1310. DOI:10.12688/f1000research.7334.1 |
Wu, S.D., Han, B.C., Jiao, Y.N., 2020. Genetic contribution of paleopolyploidy to adaptive evolution in angiosperms. Mol. Plant, 13: 59-71. DOI:10.1016/j.molp.2019.10.012 |
Wu, Z., Li, T., Liu, X.Y., et al., 2021. A novel R2R3-MYB transcription factor LlMYB305 from Lilium longiflorum plays a positive role in thermotolerance via activating heat-protective genes. Environ. Exp. Bot., 184: 104399. DOI:10.1016/j.envexpbot.2021.104399 |
Wu, Z.Y., Liang, F., Hong, B.M., et al., 2002. An endoplasmic reticulum-bound Ca2+/Mn2+ pump, ECA1, supports plant growth and confers tolerance to Mn2+ stress. Plant Physiol., 130: 128-137. DOI:10.1104/pp.004440 |
Xia, E.H., Zhang, H.B., Sheng, J., et al., 2017. The tea tree genome provides insights into tea flavor and independent evolution of caffeine biosynthesis. Mol. Plant, 10: 866-877. DOI:10.1016/j.molp.2017.04.002 |
Xiang, X.D., Zhou, X.L., Zi, H.L., et al., 2024. Populus cathayana genome and population resequencing provide insights into its evolution and adaptation. Hortic. Res., 11: uhad255. DOI:10.1093/hr/uhad255 |
Xiao, X., Zhang, Z.H., Lu, J.T., et al., 2022. Analysis on chloroplast genome characteristics and codon usage biases of Camellia rubituberculata. Seed, 41: 19-26. DOI:10.16590/j.cnki.1001-4705.2022.12.019 |
Xu, Z., Wang, H., 2007. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res., 35: W265-W268. DOI:10.1093/nar/gkm286 |
Yang, J., Lee, S.H., Goddard, M.E., et al., 2011. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet., 88: 76-82. DOI:10.1016/j.ajhg.2010.11.011 |
Yang, Z.M., Tian, S.L., Li, X.K., et al., 2022. Multi-omics provides new insights into the domestication and improvement of dark jute (Corchorus olitorius). Plant J., 112: 812-829. DOI:10.1111/tpj.15983 |
Yang, Z.H., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol., 24: 1586-1591. DOI:10.1093/molbev/msm088 |
Yang, Z.H., Chu, J.P., Yang, C.R., et al., 2018. Experimental study on the breeding for shoots seedlings grafting of Camellia rubituberculata. Guizhou For. Sci. Tech., 46: 21-24. |
Ye, S.F., Wang, L., Xie, W.B., et al., 2009. Expression profile of calcium-dependent protein kinase (CDPKs) genes during the whole lifespan and under phytohormone treatment conditions in rice (Oryza sativa L. ssp. indica). Plant Mol. Biol., 70: 311-325. DOI:10.1007/s11103-009-9475-0 |
Yisilam, G., Zheng, E., Li, C., et al., 2025. The chromosome-scale genome of black wolfberry (Lycium ruthenicum) provides useful genomic resources for identifying genes related to anthocyanin biosynthesis and disease resistance. Plant Divers., 47: 201-213. DOI:10.1016/j.pld.2025.01.001 |
Yu, X.J., Zheng, H.K., Wang, J., et al., 2006. Detecting lineage-specific adaptive evolution of brain-expressed genes in human using rhesus macaque as outgroup. Genomics, 88: 745-751. DOI:10.1016/j.ygeno.2006.05.008 |
Zhan, X.Q., Qian, B.L., Cao, F.Q., et al., 2015. An Arabidopsis PWI and RRM motif-containing protein is critical for pre-mRNA splicing and ABA responses. Nat. Commun., 6: 8139. DOI:10.1038/ncomms9139 |
Zhang, C., Dong, S.S., Xu, J.Y., et al., 2019a. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics, 35: 1786-1788. DOI:10.1093/bioinformatics/bty875 |
Zhang, W.Y., Zhang, Y.J., Qiu, H.J., et al., 2020. Genome assembly of wild tea tree DASZ reveals pedigree and selection history of tea varieties. Nat. Commun., 11: 3719. DOI:10.1038/s41467-020-17498-6 |
Zhang, X.T., Chen, S., Shi, L.Q., et al., 2021. Haplotype-resolved genome assembly provides insights into evolutionary history of the tea plant Camellia sinensis. Nat. Genet., 53: 1250-1259. DOI:10.1038/s41588-021-00895-y |
Zhang, X.T., Zhang, S.C., Zhao, Q., et al., 2019b. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants, 5: 833-845. DOI:10.1038/s41477-019-0487-8 |
Zhang, Z., 2022. KaKs_Calculator 3.0: calculating selective pressure on coding and non-coding sequences. Genom. Proteom. Bioinform., 20: 536-540. DOI:10.1016/j.gpb.2021.12.002 |
Zhang, Z.F., Huang, Y.Q., Mo, L., et al., 2009. Photosynthesis light response characteristics of four limestone plants in karst area. J. Northwest A. F. Univ., 24: 44-48. |
Zhou, Y.L., Fan, W., Zhang, H.Y., et al., 2023. Marsdenia tenacissima genome reveals calcium adaptation and tenacissoside biosynthesis. Plant J., 113: 1146-1159. DOI:10.1111/tpj.16081 |



