b. Institute of Tree Development and Genome Editing, Beijing Forestry University, Beijing 100083, China;
c. Yunnan Key Laboratory for Integrative Conservation of Plant Species with Extremely Small Populations, State Key Laboratory of Plant Diversity and Specialty Crops, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming 650201, China;
d. University of the Chinese Academy of Sciences, Beijing 100049, China;
e. Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Synthetic Biology, Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
Some species, such as gingko trees or dawn redwood trees, have persisted for millions of years and appear unchanged compared with fossil species. Darwin used the phrase 'living fossils' to describe these 'anomalous forms that endured to the present day' (Werth and Shear, 2014). A few studies have demonstrated that despite the lack of morphological change in these living fossils, their genomes have evolved extensively and continue to evolve. Understanding how the genome evolves is important for the conservation of these relict species. East Asia harbors the highest richness of fossil genera such as Cercidiphyllum (Zhu et al., 2020), Euptelea (Cao et al., 2016), Davidia (Chen et al., 2020; Ren et al., 2024), Ginkgo (Zhao et al., 2019), and Tetracentron (Liu et al., 2020; Li et al., 2021). In contrast to their wide distribution during the Tertiary, extant populations of these Tertiary relict species are often small and isolated.
Recent population genomic studies have shown that geographic and climatic fluctuations have a major effect on the demographic history and divergence of relict species (Zhao et al., 2019; Zhu et al., 2020; Ren et al., 2024). Most relict species have undergone multiple cycles of population expansion and contraction corresponding to climate change (Cao et al., 2016; Zhao et al., 2019; Ren et al., 2024). Glaciation has driven intraspecific divergence of relict species in different refugia (Qiu et al., 2011; Mao and Liu, 2012). However, diverged lineages often came into contact and hybridized following postglacial range expansions (Ren et al., 2024). Although some relict species and populations repeatedly recovered from bottlenecks during the late Tertiary/Quaternary periods (Zhao et al., 2019; Feng et al., 2024), some species or lineages within the same species continued to decline and may be on the way to extinction (Li et al., 2018; Feng et al., 2024).
Climate change has been considered a major driver of species range shifts and biodiversity loss (Scheffers et al., 2016; Meng et al., 2021). To understand the potential for climate adaptation of relict species in the future, it is important to explore the existing genetic variation of species. Species accumulate various variants during evolution. Among these, deleterious variants can impede a species' survival and reproduction, like reducing its ability to adapt to the environment or reproduce effectively (Agrawal and Whitlock, 2012; Bertorelle et al., 2022). Studies demonstrate that species or populations with different dynamic histories may have different levels of deleterious variants (Yang et al., 2018; Ma et al., 2021; Ochoa and Gibbs, 2021; Kleinman-Ruiz et al., 2022; Ren et al., 2024). In contrast, adaptive variants enhance a species' fitness, enabling it to better cope with environmental changes. Some adaptive variations have been driven by positive selection. Previous studies of population genomics have identified many positively selected genes (Zhao et al., 2019; Zhu et al., 2020; Ren et al., 2024; Xu et al., 2024). For example, signatures of positive selection were detected in genes involved in the abiotic and biotic stress resistance of ginkgo trees (Zhao et al., 2019).
Adaptive variants form the genetic foundation of local adaptation. For species having long generational times, local adaptation represents the most crucial approach to deal with climate change (Dauphin et al., 2020; Gougherty et al., 2021; Wang et al., 2023). Studying local adaptation and identifying related adaptive variants can help us better understand how plants adapt to different environments through genetic variation. Therefore, there is a growing number of studies on local adaptation (Savolainen et al., 2013; Sang et al., 2022; Wang et al., 2023; Zhang et al., 2023). These studies have revealed the genetic mechanisms by which plants adapt to environmental changes. For example, a study that showed precipitation plays an important role in the local adaptation of Populus koreana identified several genes related to climate adaptation, including CRL1, and HSP60-3A (Sang et al., 2022). Another study demonstrated that the genetic variation of Pterocarya macroptera was mainly shaped by temperature-related variables, and SNPs related to genes involved in chemical defense and gene regulation may play a role in adaptation to the environment (Wang et al., 2023). Stress response and growth-related genes were found to be involved in the local adaptation of Cercidiphyllum japonicum (Zhu et al., 2020). Relict plants have historically experienced climate oscillations, carrying rich genetic information related to environmental changes, which can provide abundant material for local adaptation studies.
In addition, predicting genomic vulnerability is of great significance. Research on Pterocarya macroptera (Wang et al., 2023) shows that marginal populations have high genomic vulnerability, indicating a high risk of extinction. For Populus koreana (Sang et al., 2022) and kiwifruit (Zhang et al., 2023), such prediction identified vulnerable populations. A study of the Swiss Alps suggested that species with long generation times may have difficulty keeping up with the rapid climate change occurring in high mountain areas (Dauphin et al., 2020). Analyses of genomic vulnerability also provides guidance for conservation strategies, reveals adaptation mechanisms, and is vital for maintaining biodiversity in the context of climate change.
Tetracentron sinense is a representative of widespread living fossil trees in East Asia (Sun et al., 2014; Li et al., 2018). T. sinense, along with Trochodendron aralioides, are the only two remaining relict species of the family Trochodendraceae. They belong to the order Trochodendrales, which is one of the four early diverged eudicot lineages (Ruhfel et al., 2014; Chanderbali et al., 2022). Although T. sinense is currently restricted to East Asia, it occurred extensively across the Northern Hemisphere during the Eocene and Miocene according to the fossil record (Grimsson et al., 2008; Manchester et al., 2018). T. sinense currently has an island-like distribution, as it only occurs in a few restricted areas of the East Asian mainland; it is classified as an endangered species in China and listed in Appendix Ⅲ of the Convention of International Trade of Endangered Species (Sun et al., 2014; Liu et al., 2020). As a Tertiary relict with a rich fossil record, T. sinense is important for the study of ancient flora and early trait evolution in the core eudicots. The high-quality whole genome of T. sinense has been sequenced three times by different groups (Liu et al., 2020; Li et al., 2021; Chanderbali et al., 2022). According to our study, the genome size of the sequenced T. sinense individual is estimated to be ~1.12 Gb, the assembly scaffold N50 is 45 Mb and the chromosome number was 2n = 48 (Liu et al., 2020). Analysis of genome sequences of T. sinense and some eudicots has shed light on the origin and evolution of core eudicots. Phylogeographical studies of T. sinense have shown that an early divergence occurred between southwestern China (represented by populations from Yunnan Province) and central subtropical China early in the Neogene, and this was followed by the divergence of several other populations in central subtropical China (Liu et al., 2020); multiple refugia have been identified in both southwestern China and central subtropical China (Sun et al., 2014). Following the publication of the high-quality reference genome of T. sinense, Liu et al. (2020) and Li et al. (2021) performed population genomic analyses on T. sinense and revealed its demographic history. However, these studies, as well as others based on inter-simple sequence repeats (ISSRs) and chloroplast spacer regions, have generated numerous uncertainties relating to patterns of genetic divergence and the responses of populations to Pleistocene climate change. In these previous studies, the number of molecular markers used and sample sizes were small. In addition, no previous studies have investigated the inbreeding, genetic load, adaptive evolution, and genomic vulnerability to future climate change of T. sinense, which are important for clarifying the evolutionary history of T. sinense.
In this study, we used re-sequenced genomes to clarify the evolutionary history and adaptive potential of the 'living fossil' Tetracentron sinense. We first identified the major lineages of T. sinense based on both nuclear and chloroplast genomic data and genetic admixture between different lineages. Next, we inferred the demographic history of the major lineages of T. sinense. We also estimated inbreeding using the runs of homozygosity (ROH), deleterious mutations and the role of positive selection of different lineages. Finally, we investigated local adaptation and genomic vulnerability of T. sinense. Our findings provide insights into the evolutionary history and conservation of T. sinense and enhance our understanding of the evolution of living fossil species.
2. Materials and methods 2.1. Sample collection and whole genome resequencingWe previously sampled 55 individuals from six populations across the range of Tetracentron sinense for whole-genome resequencing (Liu et al., 2020). Li et al. (2021) also studied the evolutionary history of T. sinense using whole-genome resequencing data from 49 individuals. Together with another study (Sun et al., 2014), our findings have suggested that T. sinense in Yunnan Province represents the oldest diverged clade of T. sinense, but the sample sizes of these studies were relatively small. Here, we collected 96 individuals from 10 populations across the entire range of T. sinense in Yunnan Province (Fig. 1 and Table S1). Eight to ten individuals were sampled for each population. Fresh leaves were collected and preserved in silica gel. Genomic DNA was extracted using the Tiangen Isolation/Extraction/Purification Kit (Tiangen Biotech (Beijing) Co., Ltd.). Illumina libraries with insert sizes of 300–500 bp were constructed for each accession according to the manufacturer's specifications (Illumina). Genomic libraries were then sequenced on an Illumina HiSeq platform in Beijing Ori-Gene Science and Technology Co., Ltd.
![]() |
Fig. 1 Population genomics of Tetracentron sinense. (A) Geographic distribution of samples. Colors in pie charts refer to genetic groups identified by ADMIXTURE (K = 6). The size indicates heterozygosity rate in different areas. (B) Population genetic structure identified by ADMIXTURE. (C) PCA plots of genetic variation in T. sinense, with the proportion of the variance explained as 39.0% for PC1, 17.7% for PC2, and 9.1% for PC3. (D) A neighbor-joining phylogenetic tree of T. sinense based on nuclear SNPs from whole-genome resequencing data. |
Raw data generated by this study were combined with that from previous studies (accession numbers are SRR11566769–SRR11566842 and SRR22404222–SRR22404273) (Liu et al., 2020; Li et al., 2021). FASTP v.0.19.3 (Chen et al., 2018) was used for quality control and adapter trimming to obtain high-quality clean reads. Clean reads were mapped to the Tetracentron sinense reference genome (Liu et al., 2020) using BWA-MEM v.2.1 (Li, 2013). Duplicates were marked with sambamba v.0.7.1 (Tarasov et al., 2015). In the subsequent analyses, we excluded bases with quality scores below 20, reads with mapping quality scores lower than 30, and sites with depths lower than 300 or higher than 2000. In addition, we excluded non-chromosomal sequences. FreeBayes v.1.3.6 (Garrison and Marth, 2012) was used to conduct SNP and genotype calling. The raw SNP set was filtered using VCFTOOLS version v.0.1.17 (Danecek et al., 2011). Non-SNP sites, non-biallelic sites, sites with a missing rate over 20%, and sites with minor allele frequency (MAF) less than 0.05 were removed. Finally, we obtained a total of 57, 976, 365 SNPs, which were used to perform ROH and SIFT deleterious mutation analysis. SNPs with a minor allele frequency < 0.05 were also removed, which left 349, 901 SNPs for linkage disequilibrium (LD) analysis. LD analysis was performed by PopLDdecay v.3.4.0 (Zhang et al., 2019). Linkage sites were filtered using PLINK v.1.07 (Chang et al., 2015). LD pruning data were used to perform the downstream analysis, such as PCA, admixture, and TreeMix analysis.
2.3. Population structure and gene flowAncestry estimation was inferred using ADMIXTURE v.1.3.0 (Alexander et al., 2009) with different values of K (K = 1 to 20). The optimal value of K was determined by the value of the cross-validation error (CV), and optimal values of K corresponded to lower CV values. Principal component analysis (PCA) was obtained using the R package based on genetic relationship matrix files which were generated by the GCTA v.1.94.1 (Yang et al., 2011). Phylogenetic relationships among individuals were reconstructed using the neighbor-joining method in MEGA v.11 (Tamura et al., 2021) based on both nuclear whole-genome SNPs and chloroplast genes. Branch support was estimated from 1000 bootstrap replicates. DnaSP v.6 (Rozas et al., 2017) was used to construct a haplotype network.
Both TreeMix v.1.13 (Pickrell and Pritchard, 2012) and Dsuite software packages (Malinsky et al., 2021) were used to estimate gene flow between populations. TreeMix v.1.13 was used on nuclear SNPs with the number of migration events ranging from one to 15, with 20 iterations for each migration event. The output files from TreeMix were input into optm v.0.1.5 to determine the optimal number of migration edges. The Dsuite package (Malinsky et al., 2021) was used to calculate branch-specific estimates of gene flow. First, Dtrios was used to calculate f4-ratio. Then, f-branch statistical values for each branch of the ML tree were estimated with the Fbranch program. The f-branch statistical results were visualized by dtools.py script.
2.4. Estimation of population genetic parametersPopulation genetic parameters, including nucleotide diversity π and the Watterson estimator (θw), and Fst (fixation index) were calculated using the thetaStat command of the ANGSD v.0.921 (Korneliussen et al., 2014) over non-overlapping 20-kb windows.
2.5. Frequency of runs of homozygosity (FROH) and deleterious mutation detectionTo examine inbreeding, runs of homozygosity (ROH) were identified using VCFTOOLS v.0.1.17 (Danecek et al., 2011) with default parameters. ROH longer than 100 kb were retained. The frequency of runs of homozygosity (FROH: sum of ROH > 100 kb/genome effective length) was then calculated.
We next investigated the genetic load of Tetracentron sinense lineages. We used SIFT (Sim et al., 2012) to predict the deleterious variants of different lineages or populations. To avoid reference bias, ancestral chromosome sequences (Liu et al., 2020) were used to construct a SIFT prediction database. SIFT (Sim et al., 2012) was used to conduct searches for proteins homologous to the query protein in the TrEMBL plant database (Boeckmann et al., 2003). These were aligned to chosen sequences, and the scores were computed for particular positions. Protein-coding variants were predicted to be deleterious (sift score < 0.05), tolerated (sift score ≥ 0.05), or synonymous. Deleterious variants both in the heterozygous state and homozygous state were calculated. Subsequently, we conducted a gene ontology (GO) analysis to better understand the detected deleterious alleles. GO enrichment analyses were conducted for the candidate genes, and we focused on those genes with significant GO terms (p < 0.05).
2.6. Demographic historyWe restricted our demographic analysis to neutral sites. Ancestral sequences were constructed using IQ-tree v.1.6.12 (Lam-Tung et al., 2015). The unfolded site frequency spectrum (SFS) was constructed using the realSFS command of ANGSD v.0.921 (Korneliussen et al., 2014). Based on the unfolded SFS, the demographic history of the six lineages was reconstructed using a stairway plot v.2 (Liu and Fu, 2020) with 200 bootstrap iterations. The estimated generation time and mutation rate were set to 15 years and 3.25e-9 per site per year, respectively, as in our previous study (Liu et al., 2020).
2.7. Detection of positive selectionSWEEPFINDER2 (DeGiorgio et al., 2016) was used to search for signs of selective sweeps in different lineages. By calculating CLR statistics, this program identifies regions that show significant deviation from the neutral SFS. SWEEPFINDER2 (DeGiorgio et al., 2016) was run using a grid size of 2 kb. We only considered the top 1% of CLR with significant selection signals, and neighbor sweep targets with scores exceeding the top 1% threshold were merged into the putative sweep regions. We performed GO enrichment analysis on the candidate genes using goseq v.1.58.0 (Young et al., 2010).
To address the limitation of using only GO enrichment IDs and to identify specific key genes related to lineage differentiation and adaptation, we conducted additional analyses focusing on one lineage of Tetracentron sinense, namely the YNWEST lineage, which is predominantly found in the Hengduan Mountains. We extracted gene sequences included in the top 20 GO terms and subjected them to BLASTP alignments against the Arabidopsis thaliana database (www.arabidopsis.org) to identify homologous genes in A. thaliana. We found that there are functional studies on 20 genes in A. thaliana. To explore the role of these 20 genes in the differentiation of the YNWEST lineage from other lineages, we calculated the Fst values of each gene between the YNWEST lineage and other lineages. We also calculated the Fst value of the entire genome between the YNWEST lineage and other lineages as a background reference. Genes with Fst values in the top 5% of the genome's Fst value distribution were maintained. Finally, we examined the SNP loci of these genes and identified missense variants.
2.8. Identification of environment-associated adaptive variantsEnvironmental data, which included 19 bioclimatic variables at 2.5-min resolution (5 km), were downloaded from WorldClim v.2.1 database (Fick and Hijmans, 2017). To select the top climatic factors, we first performed gradient forest (GF) analysis using the R package gradientForest v.0.1–37 (Ellis et al., 2012) based on 19 climatic variables and SNPs. GF models apply a non-parametric regression algorithm tree for uncovering non-linear associations among spatial, environmental, and allelic variables. We set 500 regression tree replicates, keeping other parameters at defaults. Top factors were selected based on accuracy and weighted R2 importance. To avoid multicollinearity, we calculated Pearson correlation in corrplot v.0.92 (Friendly, 2002), retaining variables with |correlation coefficient| < 0.6. Finally, five of the most important and uncorrelated factors were used for subsequent analyses, including mean diurnal range (BIO2), max temperature of the warmest month (BIO5), temperature annual range (BIO7), precipitation seasonality (BIO15), and precipitation of the coldest quarter (BIO19).
We used two different genotype–environment association (GEA) approaches to identify environment-associated SNPs. First, we used a univariate latent-factor linear mixed model (LFMM) implemented in the R package LEA v.3.3.2 (Frichot and François, 2015) to search for associations between allele frequencies and the selected five environmental variables. The input genetic data underwent conversion via the ped2lfmm function. For the missing SNP data, the impute method was applied for filling. Subsequently, we ran ten independent MCMC runs using 5000 iterations as burn-in followed by 10, 000 iterations. The pvalues function within the LEA package was employed to adjust the p-values of SNP loci for a false discovery rate (FDR) control. Finally, SNPs with q-values less than 0.001 were chosen to represent the gene–environment association.
Second, we used a multivariate Redundancy Analysis (RDA) implemented in the R package vegan v.2.5–7 (Oksanen et al., 2013) to identify variants showing an especially strong relationship with multivariate environmental axes. Prior to conducting the analyses, we converted the original VCF file into a CSV file by using PLINK v.1.07 (Chang et al., 2015). We employed the "anova.cca" function to conduct 999 permutation tests, which aimed to evaluate the significance of the influence of environmental variables on species distribution. Additionally, we utilized the "outliers" function to identify SNPs related to local adaptation. Those SNPs falling within the ±3 SD cutoff (with a two-tailed p-value of 0.0027) were considered significant, environmentally-associated variants. To identify gene functions of candidate SNPs obtained from LFMM and RDA, we employed a Gene Ontology enrichment analysis using goseq v.1.58.0 (Young et al., 2010). To further identify key environmentally-associated genes related to their adaptation and differentiation, all genes identified by LFMM and RDA were subjected to BLASTP against the proteins of Arabidopsis thaliana on the website www.arabidopsis.org. Then, we randomly selected 20 genes that were homologous to functionally studied A. thaliana genes. We calculated the Fst values of each gene between any two of the DLJ, XCB, and SNJ populations (representative populations) and also calculated the Fst value of the entire genome between these populations as a background reference. Genes with Fst values in the top 5% of the genome's Fst value distribution were maintained.
2.9. Assessment of genomic vulnerability with climate changeTo predict the genomic susceptibility to future climate change, genomic offset was assessed in a GF analysis implemented in the R package gradientforest v.0.1–37 (Ellis et al., 2012). Genomic offset was determined by the Euclidean distance of genomic compositions between the current (1970–2000) climates and the future projected (2081–2100) climates. To explore possible future climate conditions, we employed the widely used global climate models (CNRM-ESM2-1) and two different emission scenarios of the shared socioeconomic pathway (SSP126 and SSP585) that represent mild and extreme future carbon emissions, respectively. The 867 SNPs associated with the environment as detected by both LFMM and RDA were designated as the candidate SNP dataset. Additionally, 1000 randomly chosen SNPs were noted as a reference SNP dataset to match the quantity of the candidate SNP dataset. With 2000 regression trees per SNP, the GF model was applied to construct a function for the five most crucial environmental factors (BIO2, BIO5, BIO7, BIO15, BIO19).
3. Results 3.1. Population structure and gene flowWe first performed whole-genome resequencing on 96 individuals from 10 populations throughout the distribution of Tetracentron sinense in Yunnan Province (Fig. 1A and Table S1). Raw data generated by this study were combined with 104 publicly available genomes of T. sinense from 19 populations (Liu et al., 2020; Li et al., 2021). We generated an average of 20.48-fold coverage depth (Table S2). Detailed information about the population genomic analysis, such as mapping rate and mapping quality, can be found in Table S2.
Our analysis of population structure and gene flow of Tetracentron sinense revealed complex genetic relationships, divergence patterns, and gene flow among different lineages and populations. ADMIXTURE analysis of all T. sinense samples split all Yunnan populations except the XCB population from other populations (at K = 2) (Fig. 1B). Populations of T. sinense outside Yunnan Province (central subtropical China) were split into three lineages (at K = 4) that we here refer to as the eastern lineage (EAST: HB-esx, HB-bk, SNJ, SX-nxg, SX, and SX-nxhd), the northern lineage (NORTH: SC-dyxl, SC-xl, GS-wxq, and TS), and the southwestern lineage (SWEST: XCB, GZ-bj, CQ, SC-mb, SC, and SC-yad). The EAST lineage had relatively no genetic mixture from other populations, whereas the NORTH and SWEST lineages showed signs of genetic mixture. Thirty-three individuals of the SWEST lineage and NORTH lineage were of mixed ancestry, which represent possible hybrids between individuals assigned to the NORTH lineage, SWEST lineage, and EAST lineage. The SC-mb, SC, and SC-yad populations from the SWEST lineage and SC-dyxl from the NORTH lineage appeared to be admixed groups, with 16%, 34%, 45%, and 87% of their ancestry attributed to the NORTH lineage, respectively. In the NORTH lineage, GS-wxq and TS populations were admixed with the EAST lineage.
The optimal value for K was 6 according to the values of the cross-validation error of the data (Fig. S1). At K = 6, the Yunnan populations except XCB were separated into the YNWEST lineage (including all northwestern populations), YC lineage (including all the populations from central Yunnan), and YSEAST lineage (including the southeastern populations) which is consistent with the results of analysis based on only Yunnan samples (Fig. S2–S4). The YNWEST lineage and YSEAST lineage were genetically pure, whereas the YC lineage was genetically mixed with 4%–37% of its ancestry attributed to the YNWEST lineage (for the ZBS, TC, and BHL populations) and 8%–13% of its ancestry attributed to the YSE lineage (for the JD population).
Consistent with the ADMIXTURE results, principal component analysis (PCA) revealed six major clusters corresponding to the EAST, NORTH, SWEST, YSEAST, YNWEST, and YC lineages (Fig. 1C). A neighbor-joining (NJ) tree largely consistent with the results of the structure analysis and PCA revealed that the EAST, NORTH, SWEST, and YSEAST lineages formed four major clades, and the YNW and YC lineages formed a clade (Fig. 1D). The phylogenetic tree indicated that the Yunnan YNWEST, YC, and YSEAST lineages were clustered, and this group was the first to diverge (comprising the basal clade) and was sister to the other three lineages. The other three lineages were clustered; the lineage (SWEST lineage) containing the Yunnan population XCB was sister to the other two lineages.
Phylogenetic analysis based on whole chloroplast genome sequences of all 200 accessions, representing the nonrecombinant and maternally inherited genome, recovered four major clades (Fig. 2A). The first major clade included two subclades, one comprising individuals from the NORTH lineage, the other, all individuals from the EAST lineage. The second clade consisted of some individuals from the NORTH lineage. The third clade consisted exclusively of individuals from the SWEST lineage, and the fourth clade consisted of individuals from the YSEAST lineage and JD population of the YC lineage. All individuals from YNWEST and some from YC were scattered at the base of the tree as a polytomy. Further network construction revealed four major groups and possible hybridization of the NORTH lineage and YC lineage (Fig. 2B), a pattern similar to that revealed by the phylogenetic analysis.
![]() |
Fig. 2 Genetic relationships based on chloroplast genome sequences. (A) Maximum likelihood phylogeny of all accessions based on whole-chloroplast gene sequences. (B) Network of 19 chloroplast haplotypes identified from whole-chloroplast genes for Tetracentron sinense. The sizes of the circles are approximately proportional to the number of individuals assigned to the respective haplotype. |
Structure analysis of nuclear SNPs, haplotype network, and phylogenetic analysis of chloroplast genes indicated that there was gene flow between some Tetracentron sinense lineages/populations. Treemix analysis revealed gene flow between populations of the NORTH lineage and SWEST lineage or EAST lineage, consistent with that found in structure analysis (Fig. 3A). The results of the Dsuite analysis also revealed gene flow between the NORTH lineage and SWEST lineage with an admixture proportion of 37.7% (Fig. 3B). Gene flow between the YSEAST lineage and the SWEST lineage was detected by fbranch analysis and TreeMix analysis. Gene flow between the YC lineage and YSEAST lineage was detected by the fbranch analysis, TreeMix analysis and structure analysis.
![]() |
Fig. 3 Detection of gene flow between Tetracentron sinense populations. (A) Gene flow detected by TreeMix. The direction of gene flow is indicated by an arrow. The horizontal scale bar at the bottom shows the 10-fold standard error of the entries in the sample covariance matrix. (B) Gene flow detected by Fbranch. The red arrows correspond to the simulated gene flow. |
The average genetic diversity of Tetracentron sinense was 1.15 ± 0.09 × 10−2 for the average number of pairwise nucleotide differences (π) and 2.81 ± 1.46 × 10−2 for Watterson's estimator (θW). Three lineages from Yunnan Province had lower genetic diversity with π ranging from 6.79 × 10−3 to 7.39 × 10−3 and θW ranging from 7.24 × 10−3 to 9.10 × 10−3 compared with the three lineages outside of Yunnan with π above 11.67 × 10−3 and θW above 14.75 × 10−3 (Fig. 4A). Populations from Yunnan Province (except XCB) also had lower genetic diversity than other populations. The Fst values between lineages varied greatly from 0.081 to 0.343 (Fig. 4B). The lineage differentiation between Yunnan lineages (YSEAST, YC, and YNWEST) and central subtropical lineages (EAST, NORTH, and SWEST) was higher than that within Yunnan lineages or central subtropical lineages. Two Yunnan lineages, YC and YNWEST, showed the lowest differentiation with an Fst value of 0.081. The Fst values indicate that significant divergence has occurred among different lineages.
![]() |
Fig. 4 Population genetic parameters. (A) Genetic diversity (π) of Tetracentron sinense populations at the population, lineage, and species levels. The horizontal broken lines represent the average level of diversity for the species. Only the genetic diversity of populations with more than five individuals is shown. (B) Matrix of pairwise Fst values of six lineages of T. sinense. (C) Distribution of FROH and (D) homozygous deleterious mutations in six lineages of T. sinense. Lineages with the same letter are not significantly different. The line in the center of the box represents the median values, the edges of the box represent the first and third quartiles, and the whiskers above and below the box show the range of values. |
FROH indicated inbreeding significantly higher in Yunnan lineages than in other lineages (Fig. 4C), suggesting that levels of recent inbreeding were higher in Yunnan lineages than in other lineages.
To investigate the genetic load, we estimated different kinds of mutations using SIFT (Sim et al., 2012). The software classified mutations into three categories: synonymous, tolerated, and deleterious. A total of 1, 048, 575 mutations were identified by SIFT analysis. After excluding mutations in the 3′UTR, 5′UTR, and stop codons and filtering low-confidence deleterious mutations, a total of 118, 434 sites, 567, 390 sites, and 120, 623 sites were predicted to be synonymous, tolerated, and deleterious, respectively. Both the heterozygous-derived and homozygous-derived mutations of individuals from Yunnan Province were lower than individuals from NORTH and EAST lineages (Figs. 4D and S5). Although the number of heterozygous-derived mutations of individuals from the SWEST, NORTH, and EAST lineages was similar, the number of homozygous-derived mutations of individuals from the SWEST lineage was significantly lower than that of individuals from the NORTH and EAST lineages (Figs. 4D, S5 and S6).
The number of genes associated with deleterious mutations obtained was 11, 208. GO annotation of these deleterious mutations indicated that many were enriched for annotated functions involved in defense (Table S3). For example, one of the top enriched GO terms was transmembrane receptor protein serine/threonine kinase (GO: 0004675), which plays important roles in plant defense response. This GO term contains gene Tesin01G0048000, Tesin02G0032800, Tesin02G0104200, Tesin02G0121300 and Tesin16G0002900, which are homologous to the well-known plant immunity and defense genes PBL19 (AT5G47070), EFR (AT5G20480), LIK1 (AT3G14840), FLS2 (AT5G46330) and BIR1 (AT5G48380), respectively. GO annotation also identified terms"response to water (GO: 0009415), " "response to freezing (GO: 0050826), " "response to UV (GO: 0009411), " and "response to copper ion (GO: 0046688)." Moreover, our analysis detected an enrichment of genes related to key developmental processes such as "root hair initiation" (GO: 0048766), "leaf development" (GO: 0048366), "stamen development" (GO: 0048443), "sepal formation" (GO: 0048453), "pollen-pistil interaction" (GO: 0009875), and "fruit development" (GO: 0010154). For example, several genes were found to be enriched, including Tesin04G0152700 and Tesin02G0072300 (homologous to the floral homeotic genes AP1 [AT1G69120] and AP3 [AT3G54340], respectively), Tesin02G0154200 (homologous to the anther dehiscence gene DAF [AT5G05280]), Tesin18G0066100 (homologous to the pollen development gene AGL65 [AT1G18750]), and Tesin02G0206100 (homologous to the flowering-related gene PIE1 [AT3G12810]).
3.4. Demographic historyWe used the stairway plot method to infer the demographic history of the six lineages identified by the above structure analysis. We found that all populations declined abruptly or experienced bottlenecks during glaciation periods, although the demographic histories of different lineages varied. For each of the three lineages outside of Yunnan Province (EAST, NORTH, and SWEST), the earliest bottleneck event occurred at ~2 million years ago (Mya), a second occurred at ~1 Mya, and a third occurred at ~0.3 Mya in the EAST lineage and at ~20 thousand years ago (Kya) in the SWEST lineage and 30 Kya in the NORTH (Fig. 5). The second bottleneck and the third bottleneck in the EAST lineage coincided with two periods of extensive glaciation during the mid-Pleistocene, the Xixiabangma Glaciation (XG, 1.17–0.8 Mya) and the Naynayxungla Glaciation (NG, 0.78–0.5 Mya). The third bottleneck in the SWEST and NORTH lineages corresponded to the Last Glacial Maximum (LGM, ~20 Kya). After the three bottlenecks, the effective population size (Ne) of the EAST lineage recovered and then remained stable until 8 Kya, at which point it began to decline; after the third bottleneck, the Ne of the NORTH and SWEST lineages increased significantly and then remained fairly constant until the present.
![]() |
Fig. 5 Demographic history of the six major lineages of Tetracentron sinense. Stairway plot showing historical changes in effective population size (y-axis) for six lineages with a generation time of 15 years. Lighter and darker colors indicate 95% and 75% confidence intervals of the shaded areas, respectively. |
For the Yunnan lineages, the earliest bottleneck event occurred at ~1–2 Mya, around the same time as the Xixiabangma Glaciation. The second bottleneck occurred in the YNWEST lineage at ~0.3 Mya, around the same time as the Penultimate Glaciation (0.30–0.13 Mya). After these bottlenecks, the Ne of the three lineages recovered rapidly; they remained stable thereafter for a long time except that the YC lineage underwent another bottleneck and a rapid recovery. The bottleneck in the YC lineage also corresponded to the LGM. Finally, the Ne of YC was shown to have continuously declined from 10 Kya (for YSEAST, near 11 Kya) to the present and the start of population decrease coincided with the Marine Isotope Stage 2 YD (MIS 2 YD, 12.2–10.5 Kya) of the Last Glaciation. After the Ne of the YNWEST lineage decreased sharply at 10 Kya, it experienced another bottleneck at ~7 Kya (Holocene Glaciation); it then increased significantly and remained fairly constant until the present.
3.5. Genomic signals of positive selectionTo investigate the role of positive selection during the evolution of different lineages, we scanned for selective sweep regions and selected regions. SweepFinder2 detected sweep selection signals in 302, 280, 234, 465, 304, and 370 genes in EAST, NORTH, SWEST, YSEAST, YC, and YNWEST lineages, respectively. We further performed Gene Ontology (GO) enrichment analysis of these genes. In each lineage, we obtained a significant number of enriched terms (p < 0.05). In the EAST lineage, there were 176 such terms; 153 in the NORTH, 182 in the SWEST, 195 in the YSEAST, 218 in the YC, and 160 in the YNWEST. These enriched terms covered a wide range of biological processes. Many were related to defense responses, such as those represented by GO: 0009870, GO: 0006950, GO: 0006952, GO: 0009607, and GO: 0002215. In the NORTH lineage, terms related to response to humidity (GO: 0009270) and temperature stimulus (GO: 0009266) were also detected. In the SWEST lineage, epigenetic regulation-related terms (GO: 0045814, GO: 0040029) were enriched. The regulation of seedling development (GO: 1900140) was noted, as well as the response to light stimulus (GO: 0071482) in the YC lineage. Additionally, in the YNWEST lineage, terms associated with fruit and floral development (GO: 0010450, GO: 0010451, GO: 0010154, GO: 0048480) significantly enriched (Table S4).
To investigate the key genes potentially involved in the differentiation and adaptation of different lineages, we take YNWEST as an example. We blasted all genes in the 20 top GO terms in Arabidopsis thaliana and found 20 were functionally studied. Among the 20 genes, the Fst values of nine (Tesin01G0134500, Tesin01G0225200, Tesin06G0066800, Tesin10G0039500, Tesin08G0017200, Tesin11G0106400, Tesin13G0073100, Tesin21G0018600, and Tesin21G0018700) were in the top 5% of the genome's Fst value distribution (Fig. S7 and Table S5). These genes are homologous to DL1 (AT5G42080), MAP65-1 (AT5G55230), ABCG11 (AT1G17840), LAC1 (AT1G18140), CRT1a (AT1G56340), EIN3 (AT3G20770), NEK6 (AT3G44200), DL2 (AT5G42081), and BOB (AT5G53400). The mutations of these genes are shown in Table S6.
We assessed GO terms shared between any two Tetracentron sinense lineages. The number of shared GO terms varied widely, ranging from a minimum of 2 to a maximum of 49 (Table S7). Notably, the YNWEST and YC groups exhibited the highest degree of overlap, with 49 shared GO terms. Following closely, the NORTH and SWEST groups shared 26 GO terms, and the NORTH and EAST groups had 25 in common. Detailed analysis of the shared GO terms of these paired groups uncovered a complex pattern. We observed that some of the GO terms were constituted entirely of the same genes. However, other GO terms were composed solely of different genes. Additionally, there were GO terms that consist of a combination of some shared genes and some different genes. For example, consider the GO term 0003976. Both the YNWEST and YC groups were associated with this particular term. The YNWEST group encompassed 12 genes relevant to this GO term, and the YC group comprised 11. Out of these, five genes were common to both, while the remaining genes differ. These results strongly imply that different groups may have either chosen the same gene but with variant haplotypes, capitalizing on subtle genetic differences, or alternatively, they may have resorted to different genes within the same functional pathway to achieve similar physiological outcomes.
Further, we measured the degree of overlap between loci harboring deleterious mutations (we only used sites with SIFT scores of 0) and adaptive loci. Across different groups, we discovered that between 13.3% and 33.6% of the adaptive loci coincide with the loci with deleterious mutations. Significantly, in most instances, the frequency of alleles with these deleterious mutations was relatively low.
3.6. Identification of genomic variants associated with local climate adaptationFive environmental factors, namely BIO2, BIO5, BIO7, BIO15, and BIO19, were identified to be important environmental factors (Figs. S8 and S9). Among them, two are related to precipitation and three are related to temperature. We used two genotype–environment association (GEA) approaches (LFMM and RDA) to identify potential genomic variants associated with these environmental variables. LFMM detected a total of 6529 SNPs associated with these variables, with 1565 corresponding to BIO2, 1398 to BIO5, 2116 to BIO7, 63 to BIO15, and 1387 to BIO19 (Table S8). RDA identified 4691 loci related to these environmental factors, specifically BIO2 (1745), BIO5 (850), BIO7 (836), BIO15 (1060), and BIO19 (180) (Table S8). To eliminate false positives, the 867 SNPs detected by both LFMM and RDA were designated as environment-associated SNPs (Table S8). These 867 SNPs are associated with 219 genes. Gene Ontology enrichment analysis of these genes identified a total of 245 Gene Ontology categories (Table S9), all with a p-value less than 0.05. These Gene Ontology terms included those related to the response to freezing (GO: 0050826), the regulation of long-day photoperiodism (GO: 0048586), and embryo development (GO: 0009793).
To further identify key environmentally associated genes related to their adaptation and differentiation, we performed BLASTP searches of all genes identified by LFMM and RDA against the protein database of Arabidopsis thaliana available on the website www.arabidopsis.org. We found that more than 80 genes were homologous to functionally studied A. thaliana genes. The comparison revealed that four genes (Tesin01G0227200, Tesin04G0059400, Tesin06G0106600, and Tesin15G0088700, homologous to UPL5, ECT1, ABA1, and HAB1, respectively) of the 20 randomly selected genes had Fst values significantly higher than the overall genomic Fst values between certain populations, with their Fst values falling within the top 5% of the genomic Fst value distribution (Fig. S10 and Table S10). In addition, Tesin06G0129800 (homologous to CIB1) also exhibited high Fst between specific populations.
3.7. Genomic offset prediction for future climate changeTo evaluate the vulnerability of the population to climate change in the period from 2081 to 2100, we utilized the GF method to calculate the genomic offset by employing the global climate model CNRM-ESM2-1 under two different emission scenarios (SSP126 and SSP585), which represent mild and extreme future carbon emissions, respectively. A higher genomic offset indicates that greater genetic changes are required for adaptation to the changing climate (Fitzpatrick and Keller, 2015). The results showed that most populations outside Yunnan have a higher genomic offset than those within Yunnan (except for DLJ in Yunnan, which also had a high genomic offset) (Figs. 6 and S11). In particular, the genomic offsets of the SC and SC-yad populations from SWEST lineage were extremely high. As expected, for most populations, genomic offsets increased under more severe climate change scenarios, with higher emissions leading to increased overall genomic offsets. However, interestingly, we identified two populations, SC and SC-yad, which exhibited the opposite trend, with their genomic offsets being lower under SSP585 than under SSP126. Compared with all SNPs (reference), we found that adaptive SNPs (candidates) exhibited higher genomic offset under climate change, meaning that adaptive variants were more sensitive to climate change.
![]() |
Fig. 6 Prediction of genetic offset to future climate change based on five environmental variables for (A, C) all SNPs and (B, D) GEA SNPs. (A) and (B) reflect scenario SSP126 2081–2100; (C) and (D) reflect scenario SSP585 2081–2100. |
Tetracentron sinense is a Tertiary relict species. According to the results of our previous study, T. sinense diverged from Trochodendron aralioides approximately 31 Mya (Liu et al., 2020). The evolutionary patterns of T. sinense after this divergence event remain unclear. Different lineages of T. sinense have been identified in previous studies (Sun et al., 2014; Li et al., 2018, 2021; Liu et al., 2020). However, due to limitations in available genetic markers or insufficient sampling, much controversy remains. Here, structure, PCA, and phylogenetic analyses revealed that T. sinense can be clustered into six genetically distinct lineages: EAST, NORTH, SWEST, YNWEST, YC, and YSEAST. Both nuclear and chloroplast datasets supported significant population differentiation among the six lineages. The NJ tree based on nuclear SNPs revealed that the Yunnan lineages from southwestern China (YNWEST, YC, YSEAST) were closely related and that the lineages from central subtropical China (EAST, NORTH, SWEST) were also closely related (Fig. 1D). A previous study has shown that divergence within T. sinense may have been initiated by global cooling in the late Miocene, and the haplotypes in Yunnan Province were the earliest-diverging haplotypes (Sun et al., 2014). Our chloroplast DNA tree indicated that the Yunnan YNWEST + YC clade diverged first (Fig. 2A), consistent with findings from an NJ tree of whole-genome SNPs (Liu et al., 2020).
Global climate change for millions of years, especially during the Pliocene to Quaternary period, was the primary driver of population fluctuations and the divergence of many relict species (Zhao et al., 2019; Zhu et al., 2020; Feng et al., 2024; Ren et al., 2024). Climate change also contributed to changes in the size and differentiation of Tetracentron sinense populations. Our study showed that population bottlenecks occurred in different lineages T. sinense during various glaciation periods. For example, bottlenecks occurred in non-Yunnan lineages during the Xixiabangma Glaciation, and in multiple lineages during the Penultimate Glaciation and around the LGM. The dramatic change in global climate partly contributes to the current, relatively isolated, and narrow distribution of T. sinense. Differences in the extent and timing of bottlenecks among the six lineages can indeed be related to their geographical locations and the presence of refugia. T. sinense was proposed to contract into several refugia during the Pleistocene climate oscillations (Sun et al., 2014; Liu et al., 2020). During the glaciation period, the sharp decline in Ne may lead to lower genetic diversity, higher genetic drift, and inbreeding in T. sinense. In Yunnan lineages, these issues were particularly evident. Two Yunnan lineages have seen continuous Ne declines since the MIS 2 YD of the Last Glaciation and one Yunnan lineage experienced a bottleneck but recovered rapidly. Hence, it is possible that genetic drift and inbreeding further promoted lineage divergence in this species.
Although divergence between Tetracentron sinense lineages was observed, multiple hybridization events between some lineages were also detected. Hybridization, common in plants, can introduce new genetic materials and enhance genetic diversity (Henderson and Salt, 2017). In T. sinense, the genetic mixing between the NORTH and SWEST lineages, along with the genetic admixture in the YC lineage from Yunnan, indicates that hybridization has influenced the species' current genetic structure. Our supposition that hybridization during glaciation cycles likely contributed to the current genetic structure of T. sinense is in line with the refugia and post-glacial range expansion theory (Hewitt, 2000). The discovery of hybridization in T. sinense, similar to the dove tree (Ren et al., 2024), implies it may be common among relict species. Beyond these two species, hybridization occurs in some other relict species (Arrigo et al., 2016; Zhao et al., 2019; Ren et al., 2024). For instance, the ginkgo tree, a well-known relict species, has also shown signs of genetic exchange and potential hybridization events. This hybridization has not only affected the genetic makeup but also potentially influenced the adaptability and distribution of ginkgo populations over time (Zhao et al., 2019). Understanding hybridization patterns in these species can offer insights into their evolutionary history and conservation. For relict species, hybridization can be both a threat, potentially causing genetic swamping, and an opportunity, introducing beneficial genetic variation for environmental adaptation (Arrigo et al., 2016).
4.2. Insights from deleterious variant and positive selection sitesWhole-genome data indicated that Tetracentron sinense has a higher average within-species genetic diversity than that of most endangered species, and even higher than that of some unendangered species (Jing et al., 2023). We found that genetic diversity was lower in three lineages from Yunnan Province (southwestern China) than in three lineages from central subtropical China. We also found that FROH was higher in Yunnan lineages than in other lineages. Together, these findings suggest that inbreeding and bottlenecks may have contributed to the low genetic diversity of Yunnan lineages. Generally, inbreeding leads to an accumulation of deleterious homozygous mutations. However, low levels of deleterious variation were observed in Yunnan lineages. Many studies have shown that long-term small population size is associated with reduced overall diversity and deleterious variation due to the long-term purging of deleterious mutations (Ochoa and Gibbs, 2021; Kleinman-Ruiz et al., 2022; Xie et al., 2022; Robinson et al., 2023). Simulations with a stairway plot indicated Yunnan lineages had a consistently smaller Ne than other lineages from 1 Mya. Hence, the low deleterious variation in the Yunnan lineages was presumably due to the long-term purging of deleterious mutations in Yunnan lineages.
We identified 120, 623 sites deleterious sites in Tetracentron sinense. The identification of deleterious alleles through GO enrichment analysis provides valuable insights into the functional implication of these mutations. Overall, our GO enrichment analysis provides a comprehensive overview of the functional impact of deleterious mutations in the context of plant defense and development.
Our identification of several genes (Tesin01G0048000, Tesin02G0032800, Tesin02G0104200, Tesin02G0121300, and Tesin16G0002900) homologous to well-known plant immunity genes such as PBL19, EFR, LIK1, FLS2, and BIR1 underscores the importance of these pathways in maintaining plant health. Homologues of EFR, LIK1, and BIR1 have occurred at high frequency. Mutations in these genes may compromise the plant's ability to mount effective immune responses, potentially leading to increased susceptibility to pathogens. For example, EFR recognizes the elf18 peptide from bacterial EF-Tu (PAMP) and activates defense responses (Zipfel et al., 2006). LIK1 interacts with CERK1 to regulate plant immune responses to chitin (a fungal cell wall component) (Le et al., 2014). BIR1 modulates brassinosteroid signaling pathways and may also have indirect effects on immune responses (Guzman-Benito et al., 2019).
The enrichment of genes related to key developmental processes such as root hair initiation, leaf, stamen, and fruit development further underscores the broad functional impact of deleterious mutations. Here, we identified several homologues to genes critical to flower development, including AP1, AP3, DAF, AGL65, and PIE1. Mutations in these genes could lead to abnormal flower structures, impaired reproductive success, and altered growth patterns, ultimately affecting the plant's overall fitness. Specifically, mutations to AP1 and AP3 may disrupt specification of petal and stamen identities (Fatima et al., 2024); mutations to DAF may alter anther dehiscence and pollen release (Peng et al., 2013); mutations to AGL65 may affect pollen development (Adamczyk and Fernandez, 2009); and mutations to PIE1 may change flowering time regulation (Choi et al., 2009).
After identifying the deleterious sites and genes in Tetracentron sinense and their potential impacts on species adaptability, our research further explored positive evolutionary forces that may have acted within the species to counteract these negative effects. We detected sweep selection signals across different lineages of T. sinense, identifying regions of the genome where positive selection may have occurred. These signals were associated with a diverse array of enriched gene ontology (GO) terms, covering a wide spectrum of biological processes. Taking the YNWEST lineage as an example, which is predominantly found in the Hengduan Mountains, we conducted in-depth analyses to identify specific key genes related to the lineage's differentiation and adaptation to its unique environment. We found nine genes that showed high Fst values (Table S5). Interestingly, some of these genes have homologs that are known to be involved in stress responses, particularly to cold and high-temperature conditions. For instance, Tesin21G0018700 is homologous to BOB1, a gene that is induced by high temperatures and has been shown to be important for plant development and thermotolerance (Perez et al., 2009). Another example is Tesin01G0225200, which is homologous to MAP65. The protein encoded by MAP65 is essential for plant cellular dynamics, enhancing microtubule polymerization, promoting nucleation, and stabilizing microtubules against cold treatment (Kim et al., 2024). Additionally, Tesin06G0066800 is homologous to ABGC11, a gene required for the transport of cutin to the extracellular matrix (Bird et al., 2007). The mutation sites in these genes may represent potential targets of positive selection (Table S6). Given that the YNWEST lineage is situated in the Hengduan Mountains, an area characterized by a large diurnal temperature difference, it is plausible that these genes, along with their functional mutations, have contributed to the adaptation of T. sinense to such an environment. However, it is crucial to emphasize that the findings regarding T. sinense are still preliminary and require further investigation and validation before any definitive conclusions can be drawn.
Across different groups, we discovered that between 13.3% and 33.6% of the adaptive loci coincide with the loci of deleterious mutations. Significantly, in most instances, the frequency of alleles with these deleterious mutations was relatively low. This finding implies that, overall, loci undergoing positive selection are largely separate from those with a high genetic load. The observed overlap between adaptive loci and loci with deleterious mutations is quite intriguing. There are several potential explanations for this overlap. For one, certain mutations that were initially harmful may, due to environmental shifts or alterations in the genetic background, be recruited to serve an adaptive purpose. Additionally, genetic hitchhiking could be at play. In this process, during positive selection, a neutral or even deleterious mutation located near a beneficial mutation gets carried along, resulting in the co-occurrence of deleterious and adaptive loci. This analysis of the overlap between high-load and adaptive loci provides valuable insights into the complex evolutionary forces acting on the genome.
4.3. Local adaptation and genome vulnerabilityUnderstanding the genetic basis of adaptation and determining the adaptive ability of species to future climate conditions are crucial (Fitzpatrick and Keller, 2015; Dauphin et al., 2020). In our study, both LFMM and RDA identified a total of 867 environment-associated SNPs corresponding to 219 genes. These adaptive loci were correlated with temperature or precipitation variables, indicating the importance of these variables in determining local adaptation. Studies have demonstrated that temperature-related and precipitation-related factors are also important in accounting for the adaptive variation of other relict or non-relict species such as Populus koreana (Sang et al., 2022), Pterocarya macroptera (Wang et al., 2023), Cephalotaxus oliveri Mast (Wang et al., 2016), Metrosideros polymorpha Gaudich (Izuno et al., 2017), and kiwifruit (Zhang et al., 2023). Precipitation is an important factor in the local adaptation of Tetracentron sinense, because these plants are generally distributed along mountain streams and water-rich slopes (Sun et al., 2014).
Of the 219 genes carrying environment-associated SNPs, we identified several candidates whose known or predicted functions may contribute to local adaptation and potentially explain their survival across glacial periods. For example, UPL5, an E3 ubiquitin ligase for leaf senescence (Miao et al., 2010) associated with precipitation seasonality, may adjust leaf senescence timing according to precipitation through allelic variation to optimize survival in changing environments. ECT1 (involved in phosphatidylethanolamine synthesis and stomatal movement) (Negi et al., 2023) contains one SNP significantly associated with it, suggesting that allelic variation in this gene may improve plant response to diurnal temperature via allelic shifts for better gas and water regulation. Additionally, CIB1 (related to CRY2-mediated flowering) (Liu et al., 2018) shows an association with mean diurnal range and precipitation of the coldest quarter, indicating a role in adjusting reproductive timing under fluctuating thermal and wet conditions. These representative examples highlight how environment-associated genes may underlie the adaptive divergence of T. sinense across heterogeneous habitats, thereby aiding its persistence through historical climatic fluctuations.
Genomic offset has been used to predict the genome vulnerability of many species or populations (Chen et al., 2022; Sang et al., 2022; Wang et al., 2023; Zhang et al., 2023; Shen et al., 2024). We found that the genomic offsets of the DLJ population in Yunnan and SC and SC-yad populations from the SWEST lineage are extremely high, implying that these populations are potentially at higher risk of in situ extinction under future climate changes. As expected, genomic offset of most populations was shown to increase in more severe climate change scenarios, indicating that extreme future climate change will cause severe genomic vulnerability to Tetracentron sinense in most populations. Similar results have been found in other species (Sang et al., 2022; Shen et al., 2024). Interestingly, we found two populations, SC and SC-yad, that exhibited the opposite trend. Specifically, their genomic offsets were lower in more severe climate change scenarios, indicating their potentially better ability to cope with extreme environmental changes. Furthermore, genetic offsets estimated using outlier SNPs showed a pattern of increased genomic offset, suggesting that adaptive loci are more vulnerable to climate changes.
4.4. Conservation implicationsAlthough Tetracentron sinense had high within-species genetic diversity, the level of within-population/within-lineage genetic diversity was low, especially for populations/lineages from Yunnan Province. Our analysis of demographic history showed that the population size of some lineages (e.g., two from Yunnan Province) has continued to decrease since 10 Kya. Hence, in terms of genetic diversity and demographic history, T. sinense, especially T. sinense in Yunnan Province, requires further conservation attention. Furthermore, genomic offset analysis has indicated populations outside of Yunnan Province are more vulnerable to future climate change. Two populations especially in need of conservation and management include the SC and SC-yad populations, not only because of their high genomic offset to future mild climate change but also because they are potentially better equipped to cope with extreme environmental changes.
AcknowledgementsWe thank Wen-Hua Yang for technical assistance. This work was supported by the National Natural Science Foundation of China (no. 32570426), the Key Basic Research Program of Yunnan Province, China (202101BC070003), the Fundamental Research Funds for the Central Universities (QNTD202502), and the STI 2030—Major Program (2022ZD0401605-2).
CRediT authorship contribution statement
Zhao-Yang Jing: Writing – original draft, Formal analysis. Ren-Gang Zhang: Writing – original draft, Software, Formal analysis. Yang Liu: Formal analysis. Ke-Guang Cheng: Formal analysis. De-Tuan Liu: Formal analysis. Heng Shu: Formal analysis. Jiali Kong: Formal analysis. Zhong-Hua Liu: Formal analysis. Yong-Peng Ma: Writing – review & editing, Funding acquisition. Ping-Li Liu: Writing – review & editing, Writing – original draft, Supervision, Conceptualization.
Data availability
We are submitting the raw sequence data of genome sequencing to National Genomics Data Center. The accession number is PRJNA1184379.
Declaration of competing interest
The authors declare that they have no competing interests.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2025.05.008.
Adamczyk, B.J., Fernandez, D.E., 2009. MIKC* MADS-domain heterodimers are required for pollen maturation and tube growth in Arabidopsis. Plant Physiol., 149: 1713-1723. DOI:10.1104/pp.109.135806 |
Agrawal, A.F., Whitlock, M.C., 2012. Mutation load: the fitness of individuals in populations where deleterious alleles are abundant. Annu. Rev. Ecol. Evol., 43: 115-135. DOI:10.1146/annurev-ecolsys-110411-160257 |
Alexander, D.H., Novembre, J., Lange, K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res., 19: 1655-1664. DOI:10.1101/gr.094052.109 |
Arrigo, N., Betrisey, S., Graf, L., et al., 2016. Hybridization as a threat in climate relict Nuphar pumila (Nymphaeaceae). Biodivers. Conserv., 25: 1863-1877. DOI:10.1007/s10531-016-1165-z |
Bertorelle, G., Raffini, F., Bosse, M., et al., 2022. Genetic load: genomic estimates and applications in non-model animals. Nat. Rev. Genet., 23: 492-503. DOI:10.1038/s41576-022-00448-x |
Bird, D., Beisson, F., Brigham, A., et al., 2007. Characterization of Arabidopsis ABCG11 ⁄ WBC11, an ATP binding cassette (ABC) transporter that is required for cuticular lipid secretion. Plant J., 52: 485-498. DOI:10.1111/j.1365-313X.2007.03252.x |
Boeckmann, B., Bairoch, A., Apweiler, R., et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., 31: 365-370. DOI:10.1093/nar/gkg095 |
Cao, Y.-N., Comes, H.P., Sakaguchi, S., et al., 2016. Evolution of East Asia's arcto-tertiary relict Euptelea (Eupteleaceae) shaped by late Neogene vicariance and quaternary climate change. BMC Evol. Biol., 16: 1-17. DOI:10.1186/s12862-015-0575-y |
Chanderbali, A.S., Jin, L., Xu, Q., et al., 2022. Buxus and Tetracentron genomes help resolve eudicot genome history. Nat. Commun., 13: 643. DOI:10.1038/s41467-022-28312-w |
Chang, C.C., Chow, C.C., Tellier, L.C.A.M., et al., 2015. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience, 4: 7. |
Chen, S., Zhou, Y., Chen, Y., et al., 2018. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34: 884-890. DOI:10.1364/ao.57.000884 |
Chen, Y., Ma, T., Zhang, L., et al., 2020. Genomic analyses of a "living fossil": the endangered dove-tree. Mol. Ecol. Resour., 20: 756-769. DOI:10.1111/1755-0998.13138 |
Chen, Y., Jiang, Z., Fan, P., et al., 2022. The combination of genomic offset and niche modelling provides insights into climate change-driven vulnerability. Nat. Commun., 13: 4821. DOI:10.1038/s41467-022-32546-z |
Choi, J., Hyun, Y., Kang, M.-J., et al., 2009. Resetting and regulation of Flowering Locus C expression during Arabidopsis reproductive development. Plant J., 57: 918-931. DOI:10.1111/j.1365-313X.2008.03776.x |
Danecek, P., Auton, A., Abecasis, G., et al., 2011. The variant call format and VCFtools. Bioinformatics, 27: 2156-2158. DOI:10.1093/bioinformatics/btr330 |
Dauphin, B., Rellstab, C., Schmid, M., et al., 2020. Genomic vulnerability to rapid climate warming in a tree species with a long generation time. Glob. Change Biol., 27: 1181-1195. |
DeGiorgio, M., Huber, C.D., Hubisz, M.J., et al., 2016. SWEEPFINDER2: increased sensitivity, robustness and flexibility. Bioinformatics, 32: 1895-1897. DOI:10.1093/bioinformatics/btw051 |
Ellis, N., Smith, S.J., Pitcher, C.R., 2012. Gradient forests: calculating importance gradients on physical predictors. Ecology, 93: 156-168. DOI:10.1890/11-0252.1 |
Fatima, M., Ma, X., Zhang, J., et al., 2024. Genome-wide analysis of MADS-box genes and their expression patterns in unisexual flower development in dioecious spinach. Sci. Rep., 14: 18635. DOI:10.1038/s41598-024-68965-9 |
Feng, Y., Comes, H.P., Chen, J., et al., 2024. Genome sequences and population genomics provide insights into the demographic history, inbreeding, and mutation load of two 'living fossil' tree species of Dipteronia. Plant J., 117: 177-192. DOI:10.1111/tpj.16486 |
Fick, S.E., Hijmans, R.J., 2017. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol., 37: 4302-4315. DOI:10.1002/joc.5086 |
Fitzpatrick, M.C., Keller, S.R., 2015. Ecological genomics meets community-level modelling of biodiversity: mapping the genomic landscape of current and future environmental adaptation. Ecol. Lett., 18: 1-16. DOI:10.1111/ele.12376 |
Frichot, E., Francois, O., 2015. LEA: an R package for landscape and ecological association studies. Methods Ecol. Evol., 6: 925-929. DOI:10.1111/2041-210X.12382 |
Friendly, M., 2002. Corrgrams: exploratory displays for correlation matrices. Am. Statistician, 56: 316-324. DOI:10.1198/000313002533 |
Garrison, E., Marth, G., 2012. Haplotype-based variant detection from short-read sequencing. arXiv.1207.3907v2.
|
Grimsson, F., Denk, T., Zetter, R., 2008. Pollen, fruits, and leaves of Tetracentron (Trochodendraceae) from the Cainozoic of Iceland and western North America and their palaeobiogeographic implications. Grana, 47: 1-14. DOI:10.1080/00173130701873081 |
Gougherty, A.V., Keller, S.R., Fitzpatrick, M.C., 2021. Maladaptation, migration and extirpation fuel climate change risk in a forest tree species. Nat. Clim. Change, 11: 166-171. DOI:10.1038/s41558-020-00968-6 |
Guzman-Benito, I., Donaire, L., Amorim-Silva, V., 2019. The immune repressor BIR1 contributes to antiviral defense and undergoes transcriptional and post-transcriptional regulation during viral infections. New Phytol., 224: 421-438. DOI:10.1111/nph.15931 |
Henderson, I.R., Salt, D.E., 2017. Natural genetic variation and hybridization in plants. J. Exp. Bot., 68: 5415-5417. DOI:10.1093/jxb/erx377 |
Hewitt, G., 2000. The genetic legacy of the Quaternary ice ages. Nature, 405: 907-913. DOI:10.1038/35016000 |
Izuno, A., Kitayama, K., Onoda, Y., et al., 2017. The population genomic signature of environmental association and gene flow in an ecologically divergent tree species Metrosideros polymorpha (Myrtaceae). Mol. Ecol., 26: 1515-1532. DOI:10.1111/mec.14016 |
Jing, Z., Cheng, K., Shu, H., et al., 2023. Whole genome resequencing approach for conservation biology of endangered plants. Biodivers. Sci., 31: 172-183. |
Kim, P., Mahboob, S., Nguyen, H.T., et al., 2024. Characterization of soybean events with enhanced expression of the microtubule-associated protein 65-1 (MAP65-1). Mol. Plant Microbe Interact., 37: 62-71. DOI:10.1094/mpmi-09-23-0134-r |
Kleinman-Ruiz, D., Lucena-Perez, M., Villanueva, B., et al., 2022. Purging of deleterious burden in the endangered Iberian lynx. Proc. Natl. Acad. Sci. U.S.A., 119: e2110614119. DOI:10.1073/pnas.2110614119 |
Korneliussen, T.S., Albrechtsen, A., Nielsen, R., 2014. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics, 15: 356. DOI:10.1186/s12859-014-0356-4 |
Lam-Tung, N., Schmidt, H.A., von Haeseler, A., et al., 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol., 32: 268-274. DOI:10.1093/molbev/msu300 |
Le, M.H., Cao, Y., Zhang, X.C., 2014. LIK1, A CERK1-interacting kinase, regulates plant immune responses in Arabidopsis. PLoS One, 9: e102245. DOI:10.1371/journal.pone.0102245 |
Li, H., 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv 1–3.
|
Li, M., Yang, Y., Xu, R., et al., 2021. A chromosome-level genome assembly for the tertiary relict plant Tetracentron sinense Oliv. (Trochodendraceae). Mol. Ecol. Resour., 21: 1186-1199. DOI:10.1111/1755-0998.13334 |
Li, S., Gan, X., Han, H., et al., 2018. Low within-population genetic diversity and high genetic differentiation among populations of the endangered plant Tetracentron sinense Oliver revealed by inter-simple sequence repeat analysis. Ann. For. Sci., 75: 74. DOI:10.1007/s13595-018-0752-4 |
Liu, P.-L., Zhang, X., Mao, J.-F., et al., 2020. The Tetracentron genome provides insight into the early evolution of eudicots and the formation of vessel elements. Genome Biol., 21: 291. DOI:10.1186/s13059-020-02198-7 |
Liu, X., Fu, Y.-X., 2020. Stairway Plot 2: demographic history inference with folded SNP frequency spectra. Genome Biol., 21: 280. DOI:10.1007/s12250-020-00244-z |
Liu, Y., Li, X., Ma, D., et al., 2018. CIB1 and CO interact to mediate CRY2-dependent regulation of flowering. EMBO Rep., 19: e45762. DOI:10.15252/embr.201845762 |
Ma, H., Liu, Y., Liu, D., et al., 2021. Chromosome-level genome assembly and population genetic analysis of a critically endangered rhododendron provide insights into its conservation. Plant J., 107: 1533-1545. DOI:10.1111/tpj.15399 |
Malinsky, M., Matschiner, M., Svardal, H., 2021. Dsuite - fast-statistics and related admixture evidence from VCF files. Mol. Ecol. Resour., 21: 584-595. DOI:10.1111/1755-0998.13265 |
Manchester, S.R., Pigg, K.B., Kvacek, Z., et al., 2018. Newly recognized diversity in Trochodendraceae from the Eocene of western north America. Int. J. Plant Sci., 179: 663-676. DOI:10.1086/699282 |
Mao, K., Liu, J., 2012. Current relicts' more dynamic in history than previously thought. New Phytol., 196: 329-331. DOI:10.1111/j.1469-8137.2012.04305.x |
Meng, H., Gao, X., Song, Y., et al., 2021. Biodiversity arks in the anthropocene. Regional Sustainability, 2: 109-115. DOI:10.1016/j.regsus.2021.03.001 |
Miao, Y., Zentgraf, U., 2010. A HECT E3 ubiquitin ligase negatively regulates Arabidopsis leaf senescence through degradation of the transcription factor WRKY53. Plant J., 63: 179-188. DOI:10.1111/j.1365-313X.2010.04233.x |
Negi, J., Obata, T., Nishimura, S., et al., 2023. PECT1, a rate-limiting enzyme in phosphatidylethanolamine biosynthesis, is involved in the regulation of stomatal movement in Arabidopsis. Plant J., 115: 563-576. DOI:10.1111/tpj.16245 |
Ochoa, A., Gibbs, H.L., 2021. Genomic signatures of inbreeding and mutation load in a threatened rattlesnake. Mol. Ecol., 30: 5454-5469. DOI:10.1111/mec.16147 |
Oksanen, J., Blanchet, F.G., Kindt, R., et al., 2013. Package 'vegan': community ccology package. R package version 2.3–0. |
Peng, Y.-J., Shih, C.-F., Yang, J.-Y., et al., 2013. A RING-Type E3 ligase controls anther dehiscence by activating the jasmonate biosynthetic pathway gene DEFECTIVE IN ANTHER DEHISCENCE1 in Arabidopsis. Plant J., 74: 310-327. DOI:10.1111/tpj.12122 |
Perez, D.E., Hoyer, J.S., Johnson, A.I., et al., 2009. BOBBER1 is a noncanonical Arabidopsis small heat shock protein required for both development and thermotolerance. Plant Physiol., 151: 241-252. DOI:10.1104/pp.109.142125 |
Pickrell, J.K., Pritchard, J.K., 2012. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genetics, 8: 1-17. |
Qiu, Y.-X., Fu, C.-X., Comes, H.P., 2011. Plant molecular phylogeography in China and adjacent regions: tracing the genetic imprints of Quaternary climate and environmental change in the world's most diverse temperate flora. Mol. Phylogenet. Evol., 59: 225-244. DOI:10.1016/j.ympev.2011.01.012 |
Ren, Y., Zhang, L., Yang, X., et al., 2024. Cryptic divergences and repeated hybridizations within the endangered "living fossil" dove tree (Davidia involucrata) revealed by whole genome resequencing. Plant Divers., 46: 169-180. DOI:10.1016/j.pld.2024.02.004 |
Robinson, J., Kyriazis, C.C., Yuan, S.C., et al., 2023. Deleterious variation in natural populations and implications for conservation genetics. Annu. Rev. Anim. Biosci., 11: 93-114. DOI:10.1146/annurev-animal-080522-093311 |
Rozas, J., Ferrer-Mata, A., Carlos, Sanchez-DelBarrio, et al., 2017. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol., 34: 3299-3302. DOI:10.1093/molbev/msx248 |
Ruhfel, B.R., Gitzendanner, M.A., Soltis, P.S., et al., 2014. From algae to angiosperms-inferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol. Biol., 14: 1-27. DOI:10.1186/1471-2148-14-1 |
Sang, Y., Long, Z., Dan, X., et al., 2022. Genomic insights into local adaptation and future climate-induced vulnerability of a keystone forest tree in East Asia. Nat. Commun., 13: 6541. DOI:10.1038/s41467-022-34206-8 |
Savolainen, O., Lascoux, M., Merila, J., 2013. Ecological genomics of local adaptation. Nat. Rev. Genet., 14: 807-820. DOI:10.1038/nrg3522 |
Scheffers, B.R., De, Meester L., Bridge, T.C.L., et al., 2016. The broad footprint of climate change from genes to biomes to people. Science, 354: 6313. |
Shen, Y., Tao, L., Zhang, R., et al., 2024. Genomic insights into endangerment and conservation of the garlic-fruit tree (Malania oleifera), a plant species with extremely small populations. GigaScience, 2: giae070. |
Sim, N.-L., Kumar, P., Hu, J., et al., 2012. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res., 40: W452-W457. DOI:10.1093/nar/gks539 |
Sun, Y., Moore, M.J., Yue, L., et al., 2014. Chloroplast phylogeography of the East Asian arcto-tertiary relict Tetracentron sinense (Trochodendraceae). J. Biogeogr., 41: 1721-1732. DOI:10.1111/jbi.12323 |
Tamura, K., Stecher, G., Kumar, S., 2021. MEGA11 molecular evolutionary genetics analysis version 11. Mol. Biol. Evol., 38: 3022-3027. DOI:10.1093/molbev/msab120 |
Tarasov, A., Vilella, A.J., Cuppen, E., et al., 2015. Sambamba: fast processing of NGS alignment formats. Bioinformatics, 31: 2032-2034. DOI:10.1093/bioinformatics/btv098 |
Wang, T.-R., Meng, H.-H., Wang, N., et al., 2023. Adaptive divergence and genetic vulnerability of relict species under climate change: a case study of Pterocarya macroptera. Ann. Bot., 132: 241-254. DOI:10.1093/aob/mcad083 |
Wang, T., Wang, Z., Xia, F., et al., 2016. Local adaptation to temperature and precipitation in naturally fragmented populations of Cephalotaxus oliveri, an endangered conifer endemic to China. Sci. Rep., 6: 25031. DOI:10.1038/srep25031 |
Werth, A.J., Shear, W.A., 2014. The evolutionary truth about living fossils. Am. Sci., 102: 434-443. DOI:10.1511/2014.111.434 |
Xie, H.-X., Liang, X.-X., Chen, Z.-Q., et al., 2022. Ancient demographics determine the effectiveness of genetic purging in endangered lizards. Mol. Biol. Evol., 39: 1-14. DOI:10.1504/ijmlo.2022.10046495 |
Xu, W.-Q., Ren, C.-Q., Zhang, X.-Y., et al., 2024. Genome sequences and population genomics reveal climatic adaptation and genomic divergence between two closely related sweetgum species. Plant J., 118: 1372-1387. DOI:10.1111/tpj.16675 |
Yang, J., Lee, S.H., Goddard, M.E., et al., 2011. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet., 88: 76-82. DOI:10.5815/ijeme.2011.06.12 |
Yang, Y., Ma, T., Wang, Z., et al., 2018. Genomic effects of population collapse in a critically endangered ironwood tree Ostrya rehderiana. Nat. Commun., 9: 5449. DOI:10.1038/s41467-018-07913-4 |
Young, M.D., Wakefield, M.J., Smyth, G.K., et al., 2010. Gene ontology analysis for RNA-seq: accounting for selection bias. Genome Biol., 11: R14. DOI:10.1186/gb-2010-11-2-r14 |
Zhang, C., Dong, S.-S., Xu, J.-Y., et al., 2019. PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files. Bioinformatics, 35: 1786-1788. DOI:10.1093/bioinformatics/bty875 |
Zhang, X., Guo, R., Shen, R., et al., 2023. The genomic and epigenetic footprint of local adaptation to variable climates in kiwifruit. Hortic. Res., 10: uhad031. DOI:10.1093/hr/uhad031 |
Zhao, Y.-P., Fan, G., Yin, P.-P., et al., 2019. Resequencing 545 ginkgo genomes across the world reveals the evolutionary history of the living fossil. Nat. Commun., 10: 4201. DOI:10.1038/s41467-019-12133-5 |
Zhu, S., Chen, J., Zhao, J., et al., 2020. Genomic insights on the contribution of balancing selection and local adaptation to the long-term survival of a widespread living fossil tree, Cercidiphyllum japonicum. New Phytol., 228: 1674-1689. DOI:10.1111/nph.16798 |
Zipfel, C., Kunze, G., Chinchilla, D., et al., 2006. Perception of the bacterial PAMP EF-Tu by the receptor EFR restricts Agrobacterium-mediated transformation. Cell, 125: 749-760. DOI:10.1016/j.cell.2006.03.037 |