Genome analyses provide insights into <i>Engelhardia</i>'s adaptation to East Asia summer monsoon

Citation

Min Li, Jing-Jing Wu, Ren-Ping Su, Ou-Yan Fang, Xiang Cai, Pei-Han Huang, Xiao-Yang Gao, Xin-Xing Fu, Xiao-Hui Ma, Lin-Yue He, Yi-Gang Song, Guo-Xiong Hu, Shi-Shun Zhou, Yun-Hong Tan, Yves Van de Peer, Jie Li, Sheng-Dan Wu, Hong-Hu Meng. Genome analyses provide insights into Engelhardia's adaptation to East Asia summer monsoon[J]. Plant Diversity, 2025, 47(5): 718-732.

Genome analyses provide insights into Engelhardia's adaptation to East Asia summer monsoon

Min Li^a,k^,l, Jing-Jing Wu^b^,l, Ren-Ping Su^a,k, Ou-Yan Fang^a,k, Xiang Cai^a,k, Pei-Han Huang^a,k, Xiao-Yang Gao^c, Xin-Xing Fu^d, Xiao-Hui Ma^b, Lin-Yue He^b, Yi-Gang Song^e, Guo-Xiong Hu^f, Shi-Shun Zhou^j, Yun-Hong Tan^j, Yves Van de Peer^g,h,i^,*, Jie Li^a^,**, Sheng-Dan Wu^b^,***, Hong-Hu Meng^a,j,l^,****

a. Plant Phylogenetics and Conservation Group, Center for Integrative Conservation, and Yunnan Key Laboratory for Conservation of Tropical Rainforests and Asian Elephants, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla 666303, Yunnan, China;
b. State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystems, College of Ecology, Lanzhou University, Lanzhou 730000, Gansu, China;
c. CAS Key Laboratory of Tropical Plant Resources and Sustainable Use, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Science, Mengla 666303, Yunnan, China;
d. College of Life Sciences, Northwest Normal University, Lanzhou 730070, Gansu, China;
e. Eastern China Conservation Centre for Wild Endangered Plant Resources, Shanghai Chenshan Botanical Garden, Shanghai 201602, China;
f. College of Life Sciences, Guizhou University, Guiyang 550025, Guizhou, China;
g. VIB-UGent Center for Plant Systems Biology & Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium;
h. Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0028, South Africa;
i. College of Horticulture, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China;
j. Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Nay Pyi Taw 05282, Myanmar;
k. University of Chinese Academy of Sciences, Beijing 100049, China;
l. Yunnan International Joint Laboratory for the Conservation and Utilization of Tropical Timber Tree Species, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla 666303, Yunnan, China

Received 14 July 2025; Received in revised form 18 July 2025; Accepted 21 July 2025; Available online 25 July 2025

Corresponding author: E-mail address: yvpee@psb.vib-ugent.be (Y. Van de Peer);
E-mail address: jieli@xtbg.ac.cn (J. Li);
E-mail address: wusd@lzu.edu.cn (S.-D. Wu);
E-mail address: menghonghu@xtbg.ac.cn (H.-H. Meng).
Peer review under responsibility of Editorial Office of Plant Diversity.

^* Corresponding author. VIB-UGent Center for Plant Systems Biology & Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent 9052, Belgium.
^** Corresponding author.
^*** Corresponding author.
^**** Corresponding author. Plant Phylogenetics and Conservation Group, Center for Integrative Conservation, and Yunnan Key Laboratory for Conservation of Tropical Rainforests and Asian Elephants, Xishuangbanna Tropical Botanical Garden, Chinese Academy of Sciences, Mengla 666303, Yunnan, China.
¹ These authors contributed equally to this study.

Abstract: Genetic information has been instrumental in elucidating the relationship between the East Asian Summer Monsoon (EASM) and subtropical evergreen broad-leaved forests (EBLFs). However, how the genomic insights of EBLFs' species correspond to environmental shifts induced by the EASM remains limited. In this study, we investigated the adaptive mechanisms of evergreen Engelhardia species in response to the EASM through genome sequencing and comparative genomic analyses from the de novo genome assemblies of five closely related Engelhardia taxa and one Rhoiptelea species. Our findings revealed that the divergence of evergreen trees from their sister deciduous species is closely associated with the onset and intensification of the EASM. This genomic transition may have coincided with a significant expansion of the terpene synthase (TPS) gene family in E. fenzelii, driven by four distinct modes of gene duplication. This expansion enhances the biosynthesis of terpene volatiles, providing a defensive mechanism against potential herbivory in EASM affected environments. We also identified a shared whole-genome duplication (WGD) event across Engelhardia, along with substantial differences in transposable element (TE) composition and activity, which contributed to genome size variation between E. fenzelii and E. roxburghiana. In addition, demographic analyses revealed a continuous population decline over the past 10 million years, further exacerbated by recent human disturbance, underscoring the conservation urgency for these species. These results not only provide preliminary insights into the complex evolutionary dynamics within the Engelhardia genus from genomic insights (e.g., the intricate relationships between genomic variations, environmental changes, and adaptive responses driven by significant climatic events such as the EASM), but also provides valuable insights into the conservation significance of EBLFs.

Keywords: Engelhardia Comparative genomics Genome evolution EBLFs TPS EASM

1. Introduction

The persistence of existing species largely depends on their adaptive mechanisms to both paleoenvironmental and contemporary changes (Hoffmann and Sgrò, 2011; Svenning et al., 2015), including anthropogenic disturbances (Marques et al., 2019). Currently, genomic studies provide insights into how these adaptive mechanisms respond to environmental changes and demographic history (Siepielski et al., 2017; Razgour et al., 2019). A prime issue of this interplay is observed in the onset and intensification of East Asian subtropical evergreen broad-leaved forests (EBLFs) in relation to the onset and intensification of the East Asian summer monsoon (EASM) systems (Meng et al., 2025).

The Asian monsoon is a crucial component of the global climate system, impacting over half of the world's population across South and East Asia, from tropical to temperate regions (Jiang et al., 2017). Specifically, the EASM system, characterized by high summer precipitation and cool winter temperatures, has profoundly influenced the evolution of EBLFs and their associated biota (Li et al., 2022; Wan et al., 2023). The adaptation of species in EBLFs can not only reveal the evolutionary potential and distribution but also benefit biodiversity conservation under global climate change (Chen et al., 2023; Lin et al., 2023; Wei et al., 2024; Wang et al., 2025). Consequently, the genetic and genomic footprints of species are likely preserved, as these organisms respond to environmental changes over long timescales. With the increase in EASM precipitation, previously arid regions of Asia shifted to humid environments, facilitating the transition from deciduous to evergreen forests (Sun and Wang, 2005; Li et al., 2022; Meng et al., 2022b; Qin et al., 2023). As a quintessential biome in East Asia, EBLFs display high biodiversity and endemism, rendering them essential for ecosystem services that benefit humanity. They are recognized as major ecological contributors and direct successors of Cenozoic flora (Hai et al., 2022). The emergence of EBLFs and the historical development of the EASM have garnered considerable attention in ecology and biogeography, with an increasing body of research elucidating the connections between EASM variability and EBLFs' development (Hai et al., 2022). Current findings suggest a link between the diversification timelines of certain EBLFs' taxa and the geological epochs of EASM, offering insights into the assembly dynamics and historical development of these forests. Building on this foundation, various taxa within EBLFs have been employed to explore evolutionary trajectories and genomic responses to environmental change.

Nevertheless, the relationship between genomic variation and biosynthetic responses to EASM development remains unclear. To address this gap, we focus on Engelhardia (Juglandaceae), a tropical–subtropical Asian tree genus that provides an ideal model for exploring genomic responses to EASM-driven environmental change. Engelhardia is widely distributed from Southeast Asia to subtropical China, spanning a broad climatic gradient from tropical to monsoonal subtropical regions, reflecting long-term adaptive divergence under varying hydrothermal conditions (Fig. 1A and B) (Meng et al., 2022b; Zhang et al., 2025). The genus includes both evergreen species (e.g., Engelhardia fenzelii, E. roxburghiana) and deciduous species (e.g., Engelhardia spicata var. spicata, var. aceriflora, var. colebrookeana), representing divergent ecological strategies and leaf habits. Paleobotanical and phylogeographic studies suggest that Engelhardia originated in tropical Southeast Asia during the early Eocene, with fossil evidence indicating tropical affinities in the Indochina Peninsula. A major biogeographic and ecological shift occurred near the Oligocene–Miocene boundary, coinciding with global cooling and the onset of the EASM. These changes likely drove the transition of tropical elements into subtropical forms, facilitating the northward expansion of Engelhardia into East Asia. In particular, E. fenzelii, a typical evergreen species dominant in EASM-influenced evergreen broad-leaved forests (EBLFs), underwent a pronounced population expansion during the late Pleistocene, in response to increasing monsoonal precipitation. In contrast, the deciduous taxa of the E. spicata complex are more common in tropical South and Southeast Asia, where seasonal drought may have favored the evolution of periodic leaf shedding. The contrasting ecological adaptations and climate sensitivities between evergreen and deciduous Engelhardia lineages offer a valuable framework for investigating how genomic variation underpins biosynthetic and physiological responses to EASM-induced environmental heterogeneity (Meng et al., 2022b). In this study, we conducted chromosome-level genome assemblies of the aforementioned taxa, alongside one outgroup (Rhoiptelea chiliantha), to discern the genomic patterns of Engelhardia and their adaptations during the development of the EASM through comparative genomics. This study represents a pioneering attempt to trace the adaptive mechanisms underlying EBLFs' adaptation to EASM from a genomic perspective, offering new insights for future interdisciplinary studies focused on the assembly dynamics and evolutionary trajectories of EBLFs. Investigating the genomic footprints of adaptation to the EASM and the underlying adaptive mechanisms is crucial for understanding of the origin and diversification of EBLFs. Simultaneously, disentangling the demographic responses of key lineages to long-term monsoonal dynamics is essential not only for reconstructing lineage history, but also for informing effective biodiversity conservation under ongoing climate change.

Fig. 1 Overview of Engelhardia species diversity and genomic characterization of the six sequenced genomes. A. Map showing the distribution of five major clades of Engelhardia around four bioregions (Indochina, Sundaic, Philippines, and Wallacea), indicated by red triangles. Bioregional divisions follow Meng and Song (2023). B. Asian rainforests inhabited by Engelhardia species. C–H. Genomic feature distributions in five Engelhardia taxa and Rhoiptelea chiliantha genomes. The tracks from outer to inner circles indicate the following: assembled 16 chromosomes (Chr1–Chr16); gene density; GC content; Gypsy coverage; Copia coverage; LTR coverage.

Figure options

2. Methods and materials 2.1. Sample collection and genome sequencing

Plant samples, including root, stem, leaf, male flower, female flower, and fruit tissues, from Engelhardia roxburghiana, E. spicata var. spicata, and E. spicata var. aceriflora, E. spicata var. colebrookeana, E. fenzelii, and Rhoiptelea chiliantha were collected from six locations in China. Details of species, collection sites, and geographic coordinates are provided in Table S1. All samples were immediately stored in liquid nitrogen for preservation. For genome sequencing, genomic DNA was extracted using MGIEasy Genomic DNA Kit (MGI, Shenzhen, China) following the manufacturer's instructions. Illumina libraries with an insert size of 350 bp were prepared on a PCR-free DNBSEQ platform (BGI, Wuhan, China) and sequenced on a paired-end (PE) 150-bp format system. High-quality high-fidelity (HiFi) sequencing was carried out on PacBio Sequel Ⅱ platform with Sage ELF libraries, sheared with 13–16 kilobases (kb) fragments using Megaruptor3 (Diagenode, Liège, Belgium). A high-throughput chromosome conformation capture (Hi-C) sequencing library was constructed using the NEBNext Ultra Ⅱ DNA library Prep Kit (New England Biolabs, England). Fragments between 150 and 500 bp were sequenced on an Illumina platform in a PE 150-bp format. Detailed sequencing information on the six taxa is provided in Tables S2–S7. All genomic sequences were generated using BGI (Wuhan, China).

2.2. Estimation of genome size and heterozygosity

The genome size of the five Engelhardia taxa was estimated using two methods: flow cytometry (Mickael et al., 2018) and K-mer analysis (Hesse, 2023). The DNA content of five taxa was measured on a flow cytometer (Cyflow Ploidy Analyser, Partec, Munster, Germany), using Zea mays L. as internal standard. Genome size was calculated using the following formula: (mean DNA content at G1 peak of five taxa/mean DNA content at G1 peak of Zea mays) × Zea mays genome size (2.18 Gb) (Doležel and Bartoš, 2005). The results of the flow cytometry analysis are presented in Table S8. The genome size assessment for the six taxa was performed in GenomeScope v.2.0 (Vurture et al., 2017; Ranallo-Benavidez et al., 2020) based on K-mer statistics using Illumina short reads. The 43 K-mer frequency (Li et al., 2025) of short reads were used to construct K-mer depth distributions by Jellyfish v.2.3.0 (Table S9) (Marcais and Kingsford, 2011).

2.3. De novo genome assembly and chromosome construction

Adaptors and low-quality reads in the raw Illumina short-read data were filtered using SOAPnuke v.2.1.6 (Chen et al., 2018) with default parameters. High-quality HiFi reads were used to de novo assemble the contigs with Hifiasm v.0.15.4-r347 (Cheng et al., 2021). To remove redundant haplotigs and obtain a haploid representation of the genome, we applied purge_haplotigs v.1.1.1 (Roach et al., 2018) to the initial assembly before further processing. Subsequently, three rounds of contig polishing were conducted using NextPolish v.1.4.1 (Hu et al., 2020). To assess assembly accuracy, clean Hi-C reads were aligned to the polished contigs using BWA v.0.7.12-r1039 (http://bio-bwa.sourceforge.net/) to obtain uniquely mapped read pairs. Valid interaction pairs were extracted using HiC-Pro v.3.0 (Servant et al., 2015). These data were used to scaffold the contigs into chromosome-level assemblies with 3D-DNA v.180114 (Dudchenko et al., 2017), which clustered, sorted, and oriented the contigs based on Hi-C contact information. Anchoring efficiency was assessed by measuring the proportion of contigs integrated into pseudo-chromosomes relative to the total assembled genome size (Table S11). Given the relatively lower anchoring rates observed in E. roxburghiana and E. fenzelii, we further evaluated allelic redundancy and structural accuracy. To assess redundancy, we used KAT v.2.4.2 (Mapleson et al., 2017) to compare K-mer frequencies between the HiFi reads and the final assemblies. Spectra–cn plots (Fig. S3) showed that Rhoiptelea chiliantha had strong single-copy peaks and minimal missing data, indicating high assembly completeness; E. spicata var. aceriflora and E. spicata var. colebrookeana exhibited well-balanced red and black signals at the heterozygous peak, indicating minimal redundancy; In contrast, E. roxburghiana, E. fenzelii, and E. spicata var. spicata exhibited red-over-black heterozygous peaks, suggesting localized redundancy likely due to unresolved allelic variation. Similar patterns have been reported in other plant genomes with high heterozygosity or repetitive content (e.g., Manihot esculenta, Musa spp.; Huang et al., 2023; Landi et al., 2023), where redundancy is often confined to non-core genomic regions. Based on these observations, we infer that the local redundancy detected in our assemblies may also result from incomplete resolution of allelic variants in highly heterozygous or repetitive regions, rather than from systematic assembly errors. We also conducted genome-wide synteny analysis using MCScanX (Wang et al., 2012), comparing E. roxburghiana and E. fenzelii against the high-anchoring reference E. spicata var. colebrookeana. The observed one-to-one collinearity patterns confirmed structural consistency and the absence of large-scale redundant scaffolds (Fig. S4). Assembly completeness was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.3.0.2 (Simao et al., 2015) with the embryophyta_odb10 database.

2.4. Repeat sequence identification, quantification, and statistics

To improve the accuracy of repetitive sequence identification and classification, both de novo prediction and homology-based annotation methods were employed for genome-wide detection. First, de novo repeat libraries were constructed for the genomes of the six taxa using RepeatModeler v.2.0.2 (Flynn et al., 2020) and TEclass v.2.1.3 (Abrusan et al., 2009), followed by the removal of redundant or potentially contaminated sequences. Subsequently, the de novo repetitive sequence databases (RepBase database, https://ngdc.cncb.ac.cn/databasecommons/database/id/5923), along with the publicly available eukaryotic genome repetitive sequence database, were used in combination with RepeatMasker v.4.1.0 (Tarailo-Graovac and Chen, 2009) to classify repetitive sequences in each genome into different categories. Additionally, based on the repetitive sequence-related protein database, RepeatProteinMask was applied to identify repetitive sequences at the protein level in the genomes. To improve classification accuracy, tandem repeats were identified using Tandem Repeats Finder v.4.0.9 (Benson, 1999). Long terminal repeats (LTRs) in the four genomes were annotated using LTR_FINDER v.1.0.6 (Xu and Wang, 2007) and LTRharvest (Ellinghaus et al., 2008), with false positives filtered by LTR_retriever v.2.8 (Ou and Jiang, 2018). All identified repetitive sequences were then integrated, annotated, and subjected to statistical analysis for each species.

To conduct a more refined statistical analysis of transposon sequences annotated in different genomic families, RepeatMasker-annotated repeat results (.out file) were further processed using the Perl script tool 'One code to find them all' (Bailly-Bechet, 2014). This tool analyzes transposon sequence copies based on information such as name, position, and sequence orientation from transposon hit records. It integrates multiple hits corresponding to the same transposon copy and performs reclassification and statistical analysis based on this consolidated information. This approach is particularly useful for extracting information on long-transposon sequence structures. The analysis process begins by running the build_dictionary.pl script to construct a name dictionary for identifying LTR retrotransposons, followed by running the one_code_to_find_them_all.pl script in "strict mode" ('-strict'). This script applies the "80-80-80" criteria for transposon identification (Wicker et al., 2007, Lynch and Conery, 2000) (i.e., copies with sequences longer than 80 bp and 80% sequence similarity to the reference sequence are defined as valid transposon copies). Redundant statistical information was filtered out, and the final output included statistical files for each genomic scaffold, detailing the annotated transposons and tandem repeat sequences, including their lengths, copy numbers, and classifications. To examine the effect of TE expansions on genome size variation, we correlated the total amount of DNA in TEs vs. assembly size using a linear regression model (Pearson correlation coefficient).

2.5. Evaluation of TE activity

In this study, an overview of the historical dynamics of transposons was constructed by calculating the sequence divergence rate of TEs and statistically analyzing the distribution of this rate. Subsequently, the activity of TEs was inferred based on the overall characteristics of the TE sequence profile. To begin with, the Kimura distance between TE sequences and their corresponding consensus sequences was calculated to evaluate the divergence rate of TEs. Specifically, during the repeat annotation process using RepeatMasker v.4.0.7, the parameter setting '-a' was added to generate the alignment files (.align) of the TEs in the species genome and their respective consensus sequences. Then, the Perl script 'calcDivergenceFromAlign.pl' (http://www.repeatmasker.org) provided with RepeatMasker v.4.0.7 was utilized to analyze the.align files. By running this script under the parameter '-noCpGMod', a statistical file (.divsum) containing the Kimura distances of the sequence alignment was generated. Next, statistical analysis of the divergence rate of genomic repeat sequences was performed. For the.divsum file of the overall repeat sequences in the species genome, the built-in script 'createRepeatLandscape.pl' ('-g Total number of base pairs of the species genome') of the software was executed to construct the overall distribution representing the divergence rate of repeat sequences. The distribution trends of the transposon profiles obtained from the analysis were categorized into three types to infer the activity of transposons.

2.6. Genome annotation

Three independent approaches were used for protein-coding gene prediction in a repeat-masked genome: ab initio prediction, homology-based inference, and RNA-seq-based annotation. In detail, GeMoMa v.1.6.4 (Keilwagen et al., 2019) was used to align the homologous peptides from five closely related species with high-quality published genomes (Carya illinoinensis, Juglans regia, Platycarya strobilacea, Arabidopsis thaliana, and Vitis vinifera) to the repeat-masked assembly of six taxa to obtain the gene structure information. Meanwhile, Augustus v.3.3.1 (Stanke et al., 2008), GeneScan (Burge and Karlin, 1997), and GlimmerHMM (Majoros et al., 2004) were run with default parameters for ab initio gene prediction based on the training set. For RNA-seq-based gene prediction, filtered mRNA-seq reads were aligned to the reference genomes of the six taxa using HISAT2 v.2.0.4 (Kim et al., 2015) with default parameters. The transcripts were generated using Trinity v.2.1.1 (Haas et al., 2013) and open reading frames (ORFs) were predicted using PASA v.2.3.3 (Haas et al., 2003). Finally, comprehensive gene sets obtained by different strategies were integrated by EVidenceModeler (EVM) v.1.1.1 (Haas et al., 2008). To evaluate the structural accuracy of gene annotations, we assessed transcriptomic support using gffcompare v.0.11.2 (Pertea and Pertea, 2020). Transcript assemblies were generated from RNA-seq data using StringTie v.2.1.4 (Kovaka et al., 2019, Pertea et al., 2015), and aligned against genome annotations. The proportion of gene models exhibiting exact structural matches to transcript evidence (classified as '=') was calculated to assess annotation completeness and reliability. To examine potential haplotypic redundancy caused by residual heterozygosity or mis-phasing, we performed intra-genomic synteny analyses using MCScanX. Collinear gene pairs were identified, and synonymous substitution rates (Ks) were calculated using KaKs_Calculator v.2.0 (Wang et al., 2010). We considered collinear gene pairs with Ks < 0.05 as potential allelic duplicates. The proportion of such pairs was used as an estimate of possible phasing-related artifacts in gene prediction. Functional annotation of genes across the six taxa was performed following the approach described by Ma et al. (2023). Protein sequences were aligned against commonly used databases, and Gene Ontology (GO) terms and KEGG pathways (Kanehisa et al., 2016) were assigned accordingly.

2.7. Phylogeny and gene family analysis

To estimate the phylogenetic relationships and divergence times among the various species within Engelhardia, we collected the protein-coding gene sequences of four other sequenced plant species in eudicots (Morella rubra, Platycarya strobilacea, Juglans regia, and Carya cathayensis). Detailed genome information and sources are listed in Table S10. Only the longest transcript of each gene was used for subsequent analyses. We used OrthoFinder v.2.4.0 (Emms and Kelly, 2015) to cluster the protein sequences of five Engelhardia taxa and the other five species to putative gene families (or orthogroups) based on sequence similarity with default parameters. In total, 1052 strict single-copy gene families were obtained. MAFFT v.7.471 (Katoh and Standley, 2013) was used to perform multiple sequence alignments of the protein sequences in each single-copy gene family. Then, the protein sequence alignments were converted into corresponding DNA sequence alignments using PAL2NAL v.1.4 (Suyama et al., 2006). We used both maximum likelihood and coalescent-based methods for phylogenetic analyses. For maximum likelihood tree reconstruction, the DNA sequence alignments of single-copy gene families were concatenated into a super-gene matrix, and then model selection and tree inference were automatically performed with IQ-TREE v.2.1.2 (Nguyen et al., 2015). For coalescent-based phylogenetic analysis, trees for each gene family were constructed using IQTREE and subsequently summarized using ASTRAL-Ⅲ v.5.7.5 (Mirarab and Warnow, 2015). The divergence time estimation of the 10 taxa was performed using the MCMCtree module in the PAML package v.4.9 (Yang et al., 2007). We used two calibration points obtained from the TimeTree database (http://timetree.org/), including Morella rubra (28.5–84.4 million years ago (Mya)) (Liu et al., 2015) and Rhoiptelea chiliantha (66.0–96.6 Mya) (Ding et al., 2023).

To investigate the variations in the number of gene families among different species, CAFÉ v.4.2 (De Bie et al., 2006) was employed to conduct the analysis of gene family expansion and contraction. First, the time-calibrated phylogenetic tree generated by MCMCTREE was taken as the input file. Based on the stochastic evolutionary model of the CAFÉ software, analysis of the gain and loss of gene families in all 10 species during the phylogenetic process was carried out. Subsequently, the expanded gene families of the five Engelhardia taxa in the result file were extracted and the pep files corresponding to each gene therein were retrieved. Then, GO (Gene Ontology) and KEGG enrichment analyses were performed using KOBAS (http://kobas.cbi.pku.edu.cn/), using the Juglans regia set as background. The results were visualized using R.

2.8. Genome duplication and intergenomic comparison for Engelhardia

To investigate genome evolution in Engelhardia, we first identified paralogous gene pairs and syntenic blocks by performing reciprocal BLASTp v.2.7.1 (https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) searches (E-value = 1e−5) within and between genomes. The results were processed using WGDI (Sun et al., 2022) with parameters "-d, -icl, -ks, -bi, -bk, -kp" to generate intra-genomic collinearity dot plots with synonymous substitution rates (Ks). The Ka and Ks values of gene pairs were further estimated using the NG (Nei–Gojobori) model implemented in the yn00 program of PAML to infer potential whole-genome duplication (WGD) events and their timing. Cross-species comparisons were conducted with grapevine (Vitis vinifera), which has only undergone the ancestral eudicot-specific hexaploidization, as a reference. Collinearity maps between Engelhardia species and grapevine were constructed using JCVI (MCScanX Python version) to validate and visualize WGD signals. Finally, the timing of putative WGD events was calculated using the formula T = Ks / 2r, where T represents the estimated duplication time and r is the nucleotide substitution rate. To classify gene duplication modes, we applied DupGen_finder (Qiao et al., 2019) under default settings. Duplications were categorized into WGD, tandem duplication (TD), proximal duplication (PD, distance < 10 genes), transposed duplication (TRD), and dispersed duplication (DSD). For TRD detection, Rhoiptelea chiliantha was used as the outgroup, and cross-chromosomal transposed duplicates were identified by combining self- and outgroup-based BLASTp results. Each gene was assigned to a single duplication type following the priority WGD > TD > PD > TRD > DSD. Selection pressure on duplicated genes was assessed by calculating Ka, Ks, and Ka/Ks ratios using the YN model in KaKs_Calculator v.2.0.

For functional characterization, gene sets from each duplication category were annotated against Pfam, GO, KEGG, SwissProt, and NR databases. GO and KEGG enrichment analyses were then performed to evaluate functional divergence among duplication modes, with multiple testing correction using the Benjamini–Hochberg method.

2.9. Identification of key gene members involved in terpene synthesis

The TPS homologous genes in Arabidopsis thaliana were identified by searching the TAIR database (Garcia-Hernandez et al., 2002). Subsequently, the pep and cds files of these genes were extracted. The pep files of each gene family were used as inputs for constructing a database using the makeblastdb. Protein sequence alignment was performed using BLASTp with an E-value cutoff of 1e-10 to identify homologous genes from five Engelhardia taxa within the constructed database. Additionally, two seed files corresponding to TPS genes, Terpene_synthase (PF01397) and Terpene_synthase_C (PF03936) were downloaded from the Pfam domain database (Mistry et al., 2021). HMMER (Potter et al., 2018) was employed to search for TPS gene family sequences across different species, facilitating the preliminary retrieval of TPS protein sequences. Following retrieval, the protein sequences were filtered to select homologous sequences with high similarity (E-value 1e-10). Sequences less than 200 amino acids in length, alternative splicing isoforms, and unusually long sequences were excluded using custom scripts. Subsequently, the candidate genes from each species were aligned with the corresponding Arabidopsis thaliana genes to construct phylogenetic trees using IQ-TREE software. After manually removing outliers, the final dataset consisted of genes from each gene family across the species under study. In total, 167 candidate TPS protein sequences from the five sequenced taxa were identified.

2.10. Identification of differentially expressed genes and transcription factors

Transcriptome sequencing was conducted on flowers, fruits, stems, leaves, and roots of five Engelhardia taxa. For each tissue, FastQC was used to assess raw sequencing quality, and fastp was employed to trim and filter low-quality reads. Clean reads were mapped to the reference genome using HISAT2 v.2.0.4. StringTie v.2.1.4 was then used to calculate gene expression levels. TPM (Transcripts Per Kilobase Million) values were used to characterize and visualize gene expression. For differential expression analysis, raw (unnormalized) read counts were used as input to DESeq2 v.3.1.1 (Love et al., 2014) in accordance with its standard pipeline. The R package was used to detect the differentially expressed genes among different tissues. Among them, the candidate genes with at least a two-fold difference in expression levels and an FDR < 0.05 were identified as the differentially expressed genes. After determining the differentially expressed genes in various tissues, the protein sequences of each gene were submitted to the online PlantTFDB database (http://planttfdb.gao-lab.org/) to predict the differentially expressed transcription factors within the differentially expressed genes. Finally, BLASTp was used to identify homologs of the differentially expressed transcription factors in Engelhardia by comparing them against the Arabidopsis thaliana protein database.

2.11. Demographic history analyses

We used PSMC v.0.6.5-r67 (Li and Durbin, 2011) to infer changes in effective population size (Ne) over time for the reference genomes of the five taxa. To ensure a high-confidence consensus sequence, the Pairwise Sequentially Markovian Coalescent (PSMC) parameters were set as follows: base quality ≥ 50, minimum mapping quality ≥ 10, minimum depth of one-third, and maximum depth of two times the average genome coverage (Bai et al., 2018). The analysis used the following options: "-N50" for the number of cycles of the algorithm, "-t15" as the upper limit for the most recent common ancestor, "-r5" for the initial h/q value, and "-p 4 + 25·2 + 4 + 6" atomic intervals. The reconstructed demographic history was plotted using a mutation rate of 1.952 × 10⁻⁹ substitutions per site per generation and a generation time of 30 years (Bai et al., 2018; Ding et al., 2023). In addition, we integrated historical collection sites of Engelhardia species with recent field resampling sites to generate a comparative map of past distribution records and present distributions.

3. Results 3.1. De novo genome assembly and annotation

In this study, we used three sequencing strategies to de novo assemble high-quality, chromosome-level genomes for six species, with ~88.80–138.14 Gb Illumina PE short reads, ~32.36–57.73 Gb HiFi reads, and ~124.27–140.28 Gb Hi-C reads (Tables S2–S7). FCM of five Engelhardia taxa estimated genome sizes at 800–870 Mb, while K-mer gave broader estimates (720–1130 Mb) and additional insights into heterozygosity and repeat content. For Rhoiptelea chiliantha, only K-mer was used, yielding 381.74 Mb; K-mer confirmed all taxa are diploid (2n=32) (Tables S8 and S9; Fig. S1). After completing contig-level primary assembly, polishing with short reads, and further scaffolding assisted by Hi-C (see Materials and Methods), we obtained genome assemblies ranging from 414.56 Mb to 985.85 Mb. Specifically, in the Engelhardia genus, evergreen species such as E. roxburghiana (985.85 Mb, contig N50 = 16.99 Mb) and E. fenzelii (940.40 Mb, contig N50 = 26.16 Mb) exhibited relatively larger genome sizes than the deciduous species: E. spicata var. spicata (730.68 Mb, contig N50 = 47.37 Mb), E. spicata var. colebrookeana (772.59 Mb, contig N50 = 48.55 Mb), and E. spicata var. aceriflora (763.23 Mb, contig N50 = 47.05 Mb) (Table S11). On average, 95.18% of the sequences from the six genomes were properly anchored and assembled into 16 pseudo-chromosomes (Table S11; Figs. 1C–H and S2). The high contiguity and completeness of the assemblies were supported by BUSCO assessments, with 96.8%–98.9% of the 1614 BUSCO genes detected (Table S12), and by LTR Assembly Index (LAI) scores, where all assemblies reached the reference quality threshold (LAI ≥10; Table S13).

The combined use of de novo prediction and homology-based evidence indicated that the repetitive sequences constituted between 46.17% and 70.01% (191.44–681.45 Mb) of the six assembled genomes. Among these, 73.50–510.91 Mb (17.72%–50.16%) of the sequences were identified as transposable elements (TEs). Long terminal repeat (LTR) retrotransposons and DNA transposable elements accounted for 13.05%–37.25% and 4.66%–13.97% of the genomes, respectively (Tables S14–S19). The repeat-masked genomes was utilized to predict gene structures by integrating the results from de novo, homology-based, and transcriptome-based predictions (see Materials and Methods). We annotated more than 31, 000 genes for the three deciduous taxa (E. spicata var. spicata: 31, 524; E. spicata var. colebrookeana: 31, 400; E. spicata var. aceriflora: 31, 388), which is approximately 20, 000 genes fewer than the two evergreen species (E. roxburghiana: 52, 344; E. fenzelii: 53, 427) (Table S20). To evaluate gene model reliability, we used gffcompare and found that 63.65%–69.23% of annotated genes showed exact matches with transcript assemblies across the five Engelhardia taxa (Table S21). Self-synteny analyses further revealed limited haplotypic redundancy, with only ~10%–12% of collinear gene pairs showing Ks < 0.05 (Table S22), suggesting that allelic duplication or phasing artifacts had minimal impact on overall annotation quality. On average, 84.15% of the predicted genes were successfully assigned functional annotations using information available in public databases, including Pfam, Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), SwissProt, and NR (Table S23).

3.2. TEs, not WGD, drive the genome size variation between evergreen and deciduous species

We further explored the driving factors behind the larger genome size of the two evergreen species compared to their deciduous congeners (Fig. 2A). Whole-genome duplication (WGD) events are common in plants and are often recognized as contributing to the increase in genome size (Soltis and Soltis, 2016; Ren et al., 2018; Qiao et al., 2022; Feng et al., 2024). The distribution of synonymous substitution rates (Ks) among paralogous gene pairs in the five Engelhardia taxa showed similar patterns, with two distinct peaks at Ks values of ~1.5 and ~0.38, indicating two WGD events (Fig. 3C). The Ks peak (~1.5) likely reflects the ancient gamma whole-genome triplication event shared among all eudicots (Jiao et al., 2012). The most recent peak at ~0.38 suggests a relatively recent WGD event shared by all Engelhardia taxa, and these inferences are further supported by intra- and inter-genomic synteny (Fig. S5). Therefore, WGD events are unlikely to be the primary cause of genome size expansion in E. roxburghiana and E. fenzelii, as both evergreen and deciduous species have experienced similar polyploidy histories.

Fig. 2 TE composition and genome size dynamics across Engelhardia taxa. A. For Engelhardia, genome size of two evergreen species and three deciduous taxa, genome contents divided into six major genomic components are shown by bar charts on the right, along with the relative abundance of annotated known TEs. B. Heatmap represents genome coverage of the 11 major TEs types. C. Correlation of repeat content with genome size (Pearson Correlation Coefficient, n = 5, R = 0.95, two-sided p = 1.647e-08, slope = 0.99), an increase of repeat fraction with genome size, and increase in total TEs, DNA, LINE, and LTR TE classes with genome size. D. Divergence landscapes of TEs that contribute most to intrafamilial GS variation in the five Engelhardia taxa. The y-axis represents the genome coverage of various types of TE, and the x-axis represents Kimura substitution level % (TE divergence based on the Kimura distance of TE copies to the corresponding consensus sequences) from 0 to 50.

Figure options

Fig. 3 Genome evolution and phylogenomics of Engelhardia species. A. A phylogenetic tree was constructed using the genomes of nine Juglandaceae taxa (Rhoiptelea chiliantha, Engelhardia fenzelii, E. roxburghiana, E. spicata var. colebrookeana, E. spicata var. spicata, E. spicata var. aceriflora, Carya cathayensis, Platycarya strobilacea, Juglans regia) and one Myricaceae species (Morella rubra) as an outgroup. The analyses included phylogenetic analysis, gene family expansions/contractions analyses, and divergence time estimation. Green and pink represent the number of gene family contractions and expansion occurrences, respectively. The divergence times (Mya) are denoted by the numbers adjacent to the nodes. B. Venn diagram showing the shared and unique gene families among the five Engelhardia taxa. C. Ks distribution of intra-genomic syntenic genes in selected genomes.

Figure options

Having excluded WGD as the primary driver of genome expansion in Engelhardia, we next examined the contribution of transposable elements (TEs) to genome size variation among species. Although the relative proportion of TEs was broadly similar across the five genomes, with each comprising approximately 50% TE content (Tables S14–S18 and Fig. 2B), the total amount of TE-derived sequence differed markedly (Pearson's test corrected by phylogeny; r = 0.99, p = 0.0045, Fig. 2C). Specifically, the evergreen species E. roxburghiana and E. fenzelii contained 510.91 Mb and 447.62 Mb of TE content, respectively, whereas the deciduous E. spicata complex harbored only ~398.00 Mb. This difference was particularly evident in the DNA TEs. In the evergreen species, the total length of DNA transposons exceeded 100 Mb, nearly twice that of the deciduous species, which contained ~50 Mb (Tables S14–S18). To further investigate the evolutionary dynamics of these elements, we analyzed their divergence profiles based on Kimura substitution levels. Notably, the DNA_CMC-EnSpm superfamily in evergreen species exhibited a characteristic bimodal distribution, with peaks corresponding to both ancient and recent transpositional activity. This pattern suggests long-term or episodic accumulation of DNA_CMC-EnSpm superfamily. In contrast, the deciduous species showed a unimodal, bell-shaped distribution centered at intermediate divergence levels, indicating that DNA_CMC-EnSpm superfamily were historically active but have experienced limited recent mobilization (Fig. 2D). These results indicate that although TE proportions are relatively consistent across Engelhardia species, the differential accumulation of DNA TEs, particularly those belonging to the DNA_CMC-EnSpm superfamily, has substantially contributed to genome expansion in the evergreen lineages.

3.3. Gene family evolution and gene duplication

Apart from a larger genome size, the genomes of both evergreen Engelhardia species also contained notably more genes. To investigate the genetic diversity and evolutionary history of the five taxa of Engelhardia, we reconstructed a time-calibrated phylogeny using 1052 one-to-one single-copy orthologous genes identified from nine Juglandaceae taxa and Morella rubra as the outgroup (Table S24 and Fig. 3A). The resulting tree revealed that Engelhardia diverged from other Juglandaceae lineages approximately 67.80 Mya. Based on the sampled Engelhardia species, the split between evergreen and deciduous lineages was estimated to have occurred around 52.22 Mya, followed by the divergence of the two evergreen species (E. fenzelii and E. roxburghiana) at approximately 25.00 Mya. In contrast, within the deciduous clade, E. spicata var. colebrookeana diverged from other lineages around 16.48 Mya, and E. spicata var. spicata and E. spicata var. aceriflora separated approximately 9.64 Mya (Fig. 3A).

Gene family evolution analysis revealed that following their divergence, evergreen species underwent extensive gene family expansion, with 3538 and 3344 expanded families detected in Engelhardia roxburghiana and E. fenzelii, respectively—markedly higher than those observed in deciduous lineages (E. spicata var. spicata: 768, E. spicata var. colebrookeana: 714, E. spicata var. aceriflora: 745) (Table S24 and Fig. 3B). To elucidate the functional significance of these expansions, we performed GO enrichment analyses across different evolutionary nodes. At the crown node of Engelhardia (i.e., the most recent common ancestor of all five taxa), expanded gene families were strongly enriched in chitin-related pathways, including chitin catabolic process, chitin binding, chitinase activity, and extracellular region (Fisher's exact test, Benjamini–Hochberg FDR correction, adjusted p < 0.05; Table S25 and Fig. S6). After the divergence of evergreen and deciduous lineages, the two evergreen species independently acquired further gene family expansions associated with abiotic stress responses (e.g., response to salt stress, oxidative stress, unfolded protein binding), cellular signaling, and membrane trafficking processes (e.g., plasmodesmata, protein transport) (Fisher's exact test, Benjamini–Hochberg FDR correction, adjusted p < 0.05; Table S26 and Fig. S7). In contrast, the three deciduous taxa exhibited shared enrichment of gene families related to signal transduction, protein phosphorylation, membrane-associated functions, and hormone-mediated pathways, including cell surface receptor signaling, serine/threonine kinase activity, and response to auxin (Fisher's exact test, Benjamini–Hochberg FDR correction, adjusted p < 0.05; Table S27 and Fig. S8). These findings indicate that the evolution of evergreen and deciduous habits in Engelhardia was accompanied by distinct patterns of gene family expansion, reflecting lineage-specific adaptations to differing ecological strategies and environmental pressures.

To further explore the genomic basis underlying the larger gene repertoires observed in evergreen species, we analyzed gene duplication modes using DupGen_finder (Figs. 4A, S9 and S10). The two evergreen species exhibited distinct duplication patterns: Engelhardia roxburghiana retained a higher proportion of whole-genome duplicates (WGD, 57.71%), whereas E. fenzelii showed a greater prevalence of dispersed duplications (DSD, 34.59%) (Figs. 4B, S9 and S10). These patterns were reflected in gene family expansions (EGFs), with 24.77% of expansions in E. roxburghiana attributed to WGD, while 47.46% in E. fenzelii resulted from DSD (Figs. 4C and S11). The difference was particularly pronounced in the terpene synthase (TPS) gene family, which is essential for secondary metabolism and chemical defense. E. fenzelii exhibited significantly more TPS-related duplications derived from DSD, WGD, tandem (TD), and proximal duplication (PD) events than E. roxburghiana (Fisher's exact test, Benjamini–Hochberg FDR correction, adjusted p < 0.05; Tables S28 and S29; Figs. 4D and S11). This high frequency of gene duplication, likely underlies the elevated production of volatile terpenoids in E. fenzelii, providing a potential adaptive advantage in humid, evergreen environments with sustained biotic stress.

Fig. 4 Gene duplication and evolution of Engelhardia fenzelii. A. The Ka/Ks ratio distributions of gene pairs derived from different modes of duplication. In the boxplot representations, points are outliers; the center line is the median; the lower and upper hinges correspond to the first and third quartiles (25th and 75th percentiles); whiskers extend to the minimum (left whiskers) and maximum (right whiskers) estimates located within the 1.5× interquartile range (IQR) of the first and third quartiles, respectively. Gaussian kernel estimates of Ka/Ks of different duplicated groups are shown as violins. B. The proportion of each type of gene among all identified duplicated genes in E. fenzelii. C. Venn diagram shows the possible logical relations between members of expanded gene family and modes of duplication (WGD, TD, PD, TRD, and DSD). D. Functional enrichment analysis of genes overlapping between expanded gene families (EGFs) and five modes of duplication (WGD, TD, PD, TRD, and DSD). The enriched KEGG terms with corrected p value < 0.001 are presented. The color of circles represents the statistical significance of enriched KEGG terms. The size of the circles represents the number of genes in a KEGG term. For all annotated genes their KEGG terms are provided as background information. 'P adjust' is the Benjamini–Hochberg false discovery rate (FDR) adjusted P value.

Figure options

3.4. Expansion of TPS genes in terpene synthesis

To investigate the functional implications of TPS gene expansion in Engelhardia fenzelii, we conducted phylogenetic and gene expression analyses. Among all Engelhardia taxa, E. fenzelii harbored the highest number of TPS genes (50). These TPS genes clustered into six subfamilies (TPS-a, -b, -g, -e, -f and -c), with TPS-a being especially expanded in E. fenzelii. The TPS-b subfamily in Engelhardia retains duplicates derived from at least one whole-genome duplication (WGD) event, forming two branches (TPS-b1 and TPS-b2), while TPS-g diversified into TPS-g1 and TPS-g2 (Fig. 5A and B). Compared to other species, E. fenzelii exhibited a dominance of TPS-a genes, highlighting a possible functional specialization. Gene expression profiling across five tissues revealed that TPS genes involved in volatile and terpenoid backbone biosynthesis were preferentially expressed in leaves and flowers, where herbivore pressure is likely strongest (Table S30 and Fig. S12). This tissue-specific expression pattern underscores the functional specialization of TPS genes in E. fenzelii, which may contribute to adaptation to environmental conditions shaped by monsoon-influenced habitats.

Fig. 5 Identifying TPS gene family members and elucidating the impact of climate change on terpene synthesis in Engelhardia fenzelii. A. Overview of copy numbers of major gene families associated with terpene synthase in five Engelhardia taxa. B. Maximum likelihood phylogenetic tree constructed using protein sequences of five Engelhardia taxa and Arabidopsis thaliana. C. A phylogenetic tree was constructed using the genomes of five Engelhardia taxa and R. chiliantha as an outgroup. Divergence times (Mya) are denoted by the numbers adjacent to the nodes. Climatic sequence of events including a global average δ¹⁸O curve (right–hand axis) derived from benthic foraminifera, which mirrors the major global temperature trends from Oligocene to Pleistocene; the red line denotes the global temperature trends (from Zachos et al., 2008; Hansen et al., 2013; Westerhold et al., 2020; Wu et al., 2022). D. Geographic distributions of E. roxburghiana and E. fenzelii were collected from 171 and 91 populations in the tropical Indochina Peninsula and subtropical China, respectively. The red arrow denotes the wind direction of the present-day Asian summer monsoons (Licht et al., 2014). Purple dots denote sample localities of E. roxburghiana and blue triangles denote sample localities of E. fenzelii. (a) and (b): Habitats of E. roxburghiana and E. fenzelii.

Figure options

3.5. Changes in population size over time

We employed the pairwise sequentially Markovian coalescent (PSMC) method to reconstruct historical changes in effective population size (Ne) for five Engelhardia taxa (Fig. 6A). The PSMC results indicated that approximately 5 Mya, all species exhibited a trend of population expansion, likely driven by environmental changes or habitat expansion. Around 1 Mya, the Ne of all Engelhardia species reached their peak. Among them, E. spicata var. aceriflora had the highest Ne, followed by E. roxburghiana, E. spicata var. spicata and E. spicata var. colebrookeana showed similar Ne values, both lower than the preceding two taxa, while E. fenzelii had the smallest effective population size among the five taxa. During approximately 0.3–1 Mya, all species underwent a period of rapid population decline, which may be attributed to climatic fluctuations during the Middle Pleistocene. Following this period, from 0.03 to 0.3 Mya, the population sizes of most species declined at a slower rate, eventually stabilizing in recent times. During this period, E. roxburghiana and E. spicata var. spicata exhibited brief phases of slight population growth; however, these were followed by a continued decline before reaching a stable state. This dynamic history of population expansion and contraction highlights the impact of environmental and climatic events on the evolutionary trajectory of Engelhardia.

Fig. 6 Effective population sizes and geographical distribution of revisited localities and historical collection records of Engelhardia. A. Effective population sizes of the five Engelhardia taxa were estimated using PSMC. B. Comparison of historical distribution points and sampling points for field investigations. The yellow dot represents the geographical distribution of revisited localities; the blue triangle represents historical collection records of Engelhardia.

Figure options

It is worth noting that although PSMC is powerful for reconstructing deep population history, its resolution decreases substantially for more recent periods, such as the Holocene. To investigate more recent demographic and distributional changes, we therefore combined historical herbarium records and modern field data for a comparative analysis (Tables S31 and S32; Fig. 6B). A total of 378 collection sites were identified from more than 1000 historical records of five Engelhardia taxa, which span a range from south of the Yangtze River (China) to Indonesia. These were integrated with 217 contemporary distribution points obtained from recent field surveys. A comparative analysis of historical and contemporary occurrence records revealed that approximately 46% of historically recorded distribution sites no longer harbor Engelhardia populations. The dynamic population history and distribution analysis of Engelhardia revealed significant fluctuations driven by environmental and climatic events, with a persistent decline in population size over time.

4. Discussion 4.1. Genome evolution and adaptation to the EASM

Phylogenetic and taxonomic studies of the genus Engelhardia have been hindered by its complex evolutionary history and unresolved species boundaries (Meng et al., 2022a). However, recent advances in genome sequencing and computational technologies have provided new insights into the ambiguous phylogeny of this species complex. Our phylogenetic analysis, based on 1052 single-copy orthologous genes, identified three distinct lineages within the E. spicata complex (E. spicata var. spicata, E. spicata var. colebrookeana, and E. spicata var. aceriflora) with 100% bootstrap support (Fig. 3A), thereby establishing a robust framework for further understanding the phylogeny of this group.

With the phylogenetic framework established, our results provide new insights into the divergence timing within Engelhardia, particularly between the deciduous and evergreen clades, and between the two closely related species E. fenzelii and E. roxburghiana. This divergence reflects their temporal and spatial evolutionary history, tracing their expansion from the tropical Indochina Peninsula to subtropical China, which coincided with the intensification of the EASM at the Oligocene–Miocene boundary (Meng et al., 2022b). However, the adaptive mechanisms remain elusive, while the evolutionary timing and geographic distribution of these species appear to align with significant climatic changes, specifically the EASM.

To explore this, we conducted gene family evolution analyses, which revealed distinct lineage-specific expansions following the divergence of evergreen and deciduous Engelhardia species. Prior to this divergence, gene family expansions at the crown node of Engelhardia (representing the most recent common ancestor of all five taxa) were significantly enriched in chitin-related functions, including chitin binding, chitinase activity, and extracellular localization. These pathways are widely associated with antifungal defense, suggesting that enhanced pathogen resistance might have been a shared ancestral trait that could have facilitated the initial ecological success of the genus (Vaghela et al., 2022). After the divergence of evergreen and deciduous lineages, the two evergreen species, E. fenzelii and E. roxburghiana, independently acquired gene family expansions related to abiotic stress responses, membrane trafficking, and unfolded protein binding-functional traits potentially advantageous for maintaining evergreen foliage in humid environments. Evergreen species retain their leaves year-round, requiring sustained physiological homeostasis under continuous exposure to environmental stressors such as high moisture, oxidative stress, and pathogen pressure. Enhanced membrane trafficking and protein-folding capacity may help maintain cellular integrity and signaling under such conditions (Travers et al., 2000; Wang et al., 2020), while robust stress response pathways enable long-term leaf maintenance and photosynthetic efficiency (Chang et al., 2021). In contrast, the three deciduous taxa exhibited convergent expansions in gene families associated with signal transduction, kinase activity, and auxin-mediated pathways. These functions may be essential for coordinating seasonal leaf senescence and regrowth, allowing plants to minimize metabolic costs during unfavorable periods (e.g., dry or cold seasons) and rapidly resume growth when conditions improve (Edwards et al., 2017). Signal transduction and hormonal regulation, particularly auxin signaling, could play central roles in modulating bud dormancy, flowering time, and growth rhythms, facilitating phenological plasticity and synchronization with monsoonal rainfall cycles (Liu et al., 2013; Liang et al., 2022). This strategy supports efficient resource allocation and environmental responsiveness in fluctuating seasonal habitats.

Building on this functional divergence, Engelhardia fenzelii exhibited a marked, lineage-specific expansion of the TPS gene family, which may enhance the biosynthesis of volatile terpenoids. Compared to its sister species E. roxburghiana, which is broadly distributed across South and Southeast Asia and southern China, E. fenzelii is narrowly restricted to the humid, monsoon-dominated regions of southeastern China (Zhejiang, Fujian, Jiangxi, and southeastern Hunan) (Zhang et al., 2020). These regions are characterized by high annual precipitation, extended wet seasons, and mild winters due to the EASM (Xu et al., 2017; Ye et al., 2022). Such climatic conditions promote year-round microbial activity and persistent herbivore pressure, especially for evergreen species that retain foliage throughout the year (Zvereva and Kozlov, 2014; Xu et al., 2017). Under these conditions, the constitutive production of volatile terpenoids offers continuous chemical defense by deterring herbivores and inhibiting microbial colonization (Mumm et al., 2008; Zulak and Bohlmann, 2010; Celedon and Bohlmann, 2019). Therefore, the expansion of TPS genes in E. fenzelii might may represent an adaptive response to elevated biotic stress in monsoonal evergreen habitats, suggesting a possible role for EASM-driven environmental selection in shaping lineage-specific metabolic strategies and genomic differentiation in Engelhardia. Our prior studies show that phylogeographic and paleobotanical evidence further supports a close association between the historical divergence of E. fenzelii and the EASM. Divergence between E. fenzelii and other Engelhardia lineages is estimated to have occurred during the Oligocene–Miocene transition, potentially reflecting historical episodes of local adaptation and allopatric divergence potentially shaped by spatially and temporally variable monsoonal regimes (Meng et al., 2022b).

Our analyses also highlight the role of transposable elements (TEs) in shaping genome architecture in Engelhardia. Variation in the activity of different TE types across Engelhardia species may contribute to functional gene diversification, thereby enhancing genetic diversity (Wang et al., 2021). Analyses of the TE landscape support this evolutionary dynamic: E. fenzelii and E. roxburghiana display a 'bimodal' distribution pattern of TE activity, indicating at least two distinct waves of TE proliferation. In contrast, species within the E. spicata complex show a bell-shaped distribution, suggesting that while the majority of TEs in these genomes were active in the past, their activity has recently declined. This discrepancy likely reflects species-specific regulatory mechanisms that govern TE dynamics over evolutionary timescales, ultimately shaping genomic stability and genomic trajectories (Quesneville et al., 2005).

By integrating prior paleoclimate evidence, biogeographic patterns, and current comparative genomics, our study shows that the EASM not only catalyzed the divergence of Engelhardia lineages, but also repeatedly drove ecological and genomic adaptations. E. fenzelii, in particular, exemplifies how regional climatic shifts can leave lasting genomic signatures, offering a valuable framework for studying climate-driven adaptation in subtropical forests.

4.2. Conservation implication

As a characteristic biome of East Asia, evergreen broadleaf forests (EBLFs) are distinguished by their high biodiversity and endemism, making them vital for ecosystem services that benefit humanity. These ecosystems are recognized as significant remnants of Cenozoic flora (Zhao et al., 2025). Understanding how biodiversity within EBLFs responds to environmental changes—including climate variation and human activities, provides critical insights for conservation management and the development of effective conservation policies and strategies. Our research indicates that subtropical China has served as a refugium during past climate oscillations and is likely to continue fulfilling this role in the future, as evidenced by the biogeographic distributions of Engelhardia roxburghiana and E. fenzelii across the Indochina Peninsula and subtropical China bioregions (Meng et al., 2022b).

Historical refugia have shielded species from cooling events associated with past climate change; however, current and future refugia must now safeguard species from rising temperatures. It is concerning that the distribution ranges of EBLFs, particularly those in subtropical China, have been significantly affected by extensive land use over millennia. Changes in land use have fragmented these natural ecosystems, resulting in serious consequences for biodiversity. The interaction between climate change and human activity on biodiversity highlights the urgent need for vigilant monitoring in the Anthropocene (Meng et al., 2021).

In our study, PSMC plots revealed a sharp decline in the effective population sizes of five Engelhardia taxa between 2.0 and 1.0 Mya (Fig. 6A), indicating a potential threat to their long-term viability. Field investigations demonstrated that many historically recorded sites have been transformed into artificial economic forests, such as oil palm, rubber, and cocoa plantations (Zhang et al., 2023). By integrating genomic evidence with field surveys, we highlighted a consistent trend of population reduction in Engelhardia species, as reflected in both historical genetic analyses and current distribution records. The onset of the Anthropocene, characterized by human activities and climate change, poses significant threats to biodiversity and the ecosystems worldwide (Meng et al., 2021). The current distribution of EBLFs suggests favorable conditions and potentially suitable habitats for these plant species amid the anticipated climate changes (Meng et al., 2022b).

In this context, it is essential to evaluate the responses of EBLFs to human activities in order to determine the necessity of establishing potential buffer zones. Understanding how EBLFs adapt to environmental changes, including historical climate conditions (such as adaptations related to the development of the East Asian monsoon system) and demographic history, is critical for ensuring their long-term survival throughout the evolutionary process. Furthermore, identifying and designating targeted conservation areas within the distribution ranges of EBLFs represents an effective strategy to address the challenges posed by a rapidly changing global environment.

In summary, our findings and corresponding analyses tentatively link the genomic characteristics of Engelhardia with the evolutionary history of the East Asian monsoon systems, illustrating potential adaptive mechanisms. Preliminary genomic analyses provide insights into how EBLFs may have responded to past changes in monsoon patterns (Fig. 5A–D), as well as the impacts of human activity on effective population sizes throughout their demographic history (Fig. 6B). Investigating other taxa will further test proposed adaptive mechanisms may connect the genomic characteristics of EBLFs to the historical changes of East Asian monsoons.

5. Conclusions

This study presents a comprehensive analysis of the genomic evolution of Engelhardia and its potential adaptation to the East Asian summer monsoon (EASM). High-quality genome assemblies revealed significant whole-genome duplication events, an expansion of transposable elements (TEs), and adaptive changes within gene families. Notably, both E. fenzelii and E. roxburghiana displayed unique TE-driven genomic expansions. In particular, E. fenzelii showed a significant enrichment of the terpene synthase (TPS) gene family, which may be associated with ecological resilience and the production of secondary metabolites. Phylogenetic analysis confirmed the monophyletic relationships among these species and their divergence in correlation with the intensification of the EASM. Additionally, the observed declines in population size underscore the climate-driven impacts on species distribution. These findings provide insights into the genomic adaptations that coincide with the development of the EASM, offering valuable insights into biodiversity conservation efforts in East Asian ecosystems, particularly within evergreen broadleaf forests (EBLFs).

Acknowledgements

We thank the Supercomputing Center of Lanzhou University for providing the research cyberinfrastructure resources and service. This work was supported by the National Natural Science Foundation of China (No. 42171063); Southeast Asia Biodiversity Research Institute, Chinese Academy of Sciences (No. Y4ZK111B01); the Special Fund for Scientific Research of Shanghai Landscaping & City Appearance Administrative Bureau (G242414, G242416); the "Yunnan Revitalization Talent Support Program" in Yunnan Province (XDYC-QNRC-2022-0028); Yunnan Revitalization Talent Support Program "Innovation Team" Project (202405AS350019); the CAS "Light of West China" Program; the 14th Five-Year Plan of Xishuangbanna Tropical Botanical Garden, Chinese Academy Sciences (XTBG-1450303); the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (No. 833522); and Ghent University (Methusalem funding, BOF.MET.2021.0005.01).

CRediT authorship contribution statement

Min Li: Writing – review & editing, Writing – original draft, Visualization, Software, Methodology, Formal analysis, Data curation. Jing-Jing Wu: Software, Methodology, Formal analysis. Ren-Ping Su: Writing – original draft, Visualization. Ou-Yan Fang: Writing – original draft. Xiang Cai: Writing – original draft. Pei-Han Huang: Writing – original draft. Xiao-Yang Gao: Writing – original draft, Software. Xin-Xing Fu: Writing – original draft, Software. Xiao-Hui Ma: Software. Lin-Yue He: Software. Yi-Gang Song: Writing – original draft. Guo-Xiong Hu: Writing – original draft. Shi-Shun Zhou: Resources. Yun-Hong Tan: Resources. Yves Van de Peer: Writing – original draft. Jie Li: Supervision. Sheng-Dan Wu: Writing – original draft, Supervision, Software, Methodology, Conceptualization. Hong-Hu Meng: Writing – review & editing, Writing – original draft, Visualization, Resources, Project administration, Investigation, Funding acquisition, Conceptualization.

Data accessibility statement

All raw reads sequences (short reads, long reads, and Hi-C reads) of six taxa used in this study have been submitted to GenBank of NCBI (https://www.ncbi.nlm.nih.gov/) under the BioProject accession number PRJNA1240687. The assembly and annotation files can be available from https://figshare.com/s/a7774877ace16c528d6e. Source data are provided within this paper.

Declaration of competing interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2025.07.003.

References

Abrusán, G., Grundmann, N., DeMester, L., et al., 2009. TEclass-a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics, 25: 1329-1330. DOI:10.1093/bioinformatics/btp084

Bai, W.N., Yan, P.C., Zhang, B.W., et al., 2018. Demographically idiosyncratic responses to climate change and rapid Pleistocene diversification of the walnut genus Juglans (Juglandaceae) revealed by whole-genome sequences. New Phytol., 217: 1726-1736. DOI:10.1111/nph.14917

Bailly-Bechet, M., Haudry, A., Lerat, E., 2014. "One code to find them all": a perl tool to conveniently parse RepeatMasker output files. Mob. DNA, 5: 13. DOI:10.1186/1759-8753-5-13

Benson, G., 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res., 27: 573-580. DOI:10.1093/nar/27.2.573

Burge, C., Karlin, S., 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol., 268: 78-94. DOI:10.1006/jmbi.1997.0951

Celedon, J.M., Bohlmann, J., 2019. Oleoresin defenses in conifers: chemical diversity, terpene synthases and limitations of oleoresin defense under climate change. New Phytol., 224: 1444-1463. DOI:10.1111/nph.15984

Chang, C.Y., Bräutigam, K., Hüner, N.P.A., et al., 2021. Champions of winter survival: cold acclimation and molecular regulation of cold hardiness in evergreen conifers. New Phytol., 229: 675-691. DOI:10.1111/nph.16904

Chen, Y., Chen, Y., Shi, C., et al., 2018. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience, 7: 1-6.

Chen, Z., Zhou, Z., Guo, Z.M., et al., 2023. Historical development of karst evergreen broadleaved forests in East Asia has shaped the evolution of a hemiparasitic genus Brandisia (Orobanchaceae). Plant Divers., 45: 501-512. DOI:10.1016/j.pld.2023.03.005

Cheng, H., Concepcion, G.T., Feng, X., et al., 2021. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods, 18: 170-175. DOI:10.1038/s41592-020-01056-5

De Bie, T., Cristianini, N., Demuth, J.P., et al., 2006. CAFE: a computational tool for the study of gene family evolution. Bioinformatics, 22: 1269-1271. DOI:10.1093/bioinformatics/btl097

Ding, Y.M., Pang, X.X., Cao, Y., et al., 2023. Genome structure-based Juglandaceae phylogenies contradict alignment-based phylogenies and substitution rates vary with DNA repair genes. Nat. Commun., 14: 617. DOI:10.1038/s41467-023-36247-z

Doležel, J., Bartoš, J., 2005. Plant DNA flow cytometry and estimation of nuclear genome size. Ann. Bot., 95: 99-110. DOI:10.1093/aob/mci005

Dudchenko, O., Batra, S.S., Omer, A.D., et al., 2017. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science, 356: 92-95. DOI:10.1126/science.aal3327

Edwards, E.J., Chatelet, D.S., Chen, B.C., et al., 2017. Convergence, consilience, and the evolution of temperate deciduous forests. Am. Nat., 190: S87-S104. DOI:10.1086/692627

Ellinghaus, D., Kurtz, S., Willhoeft, U., 2008. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics, 9: 18. DOI:10.1186/1471-2105-9-18

Emms, D.M., Kelly, S., 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol., 16: 157. DOI:10.1186/s13059-015-0721-2

Feng, X., Chen, Q., Wu, W., et al., 2024. Genomic evidence for rediploidization and adaptive evolution following the whole - genome triplication. Nat. Commun., 15: 1635. DOI:10.1038/s41467-024-46080-7

Flynn, J.M., Hubley, R., Goubert, C., et al., 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. U.S.A., 117: 9451-9457. DOI:10.1073/pnas.1921046117

Garcia-Hernandez, M., Berardini, T.Z., Chen, G.H., et al., 2002. TAIR: a resource for integrated Arabidopsis data. Funct. Integr. Genomics, 2: 239-253. DOI:10.1007/s10142-002-0077-z

Haas, B.J., Delcher, A.L., Mount, S.M., et al., 2003. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res., 31: 5654-5666. DOI:10.1093/nar/gkg770

Haas, B.J., Papanicolaou, A., Yassour, M., et al., 2013. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc., 8: 1494-1512. DOI:10.1038/nprot.2013.084

Haas, B.J., Salzberg, S.L., Zhu, W., et al., 2008. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol., 9: R7. DOI:10.1186/gb-2008-9-1-r7

Hai, L., Li, X.Q., Zhang, J.B., et al., 2022. Assembly dynamics of East Asian subtropical evergreen broadleaved forests: New insights from the dominant Fagaceae trees. J. Integr. Plant Biol., 64: 2126-2134. DOI:10.1111/jipb.13361

Hansen, J., Sato, M., Russell, G., et al., 2013. Climate sensitivity, sea level and atmospheric carbon dioxide. Philos. Trans. A Math. Phys. Eng. Sci., 371: 20120294. DOI:10.1098/rsta.2012.0294

Hesse, U., 2023. K-mer-based genome size estimation in theory and practice. Methods Mol. Biol., 2672: 79-113. DOI:10.1007/978-1-0716-3226-0_4

Hoffmann, A.A., Sgrò, C.M., 2011. Climate change and evolutionary adaptation. Nature, 470: 479-485. DOI:10.1038/nature09670

Hu, J., Fan, J., Sun, Z., et al., 2020. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics, 36: 2253-2255. DOI:10.1093/bioinformatics/btz891

Huang, H.R., Liu, X., Arshad, R., et al., 2023. Telomere-to-telomere haplotype-resolved reference genome reveals subgenome divergence and disease resistance in triploid Cavendish banana. Hortic. Res., 10: uhad153. DOI:10.1093/hr/uhad153

Jiao, Y., Leebens-Mack, J., Ayyampalayam, S., et al., 2012. A genome triplication associated with early diversification of the core eudicots. Genome Biol., 13: R3. DOI:10.1186/gb-2012-13-1-r3

Jiang, H., Wan, S., Ma, X., et al., 2017. End-member modeling of the grain-size record of Sikouzi fine sediments in Ningxia (China) and implications for temperature control of Neogene evolution of East Asian winter monsoon. PLoS One, 12: e0186153. DOI:10.1371/journal.pone.0186153

Kanehisa, M., Sato, Y., Kawashima, M., et al., 2016. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res., 44: D457-D462. DOI:10.1093/nar/gkv1070

Katoh, K., Standley, D.M., 2013. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol., 30: 772-780. DOI:10.1093/molbev/mst010

Keilwagen, J., Hartung, F., Grau, J., 2019. GeMoMa: Homology-Based Gene Prediction Utilizing Intron Position Conservation and RNA-seq Data. Methods Mol. Biol., 1962: 161-177. DOI:10.1007/978-1-4939-9173-0_9

Kim, D., Langmead, B., Salzberg, S.L., 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods, 12: 357-360. DOI:10.1038/nmeth.3317

Kovaka, S., Zimin, A.V., Pertea, G.M., et al., 2019. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol., 20: 278. DOI:10.1186/s13059-019-1910-1

Landi, M., Shah, T., Falquet, L., et al., 2023. Haplotype-resolved genome of heterozygous African cassava cultivar TMEB117 (Manihot esculenta). Sci. Data, 10: 887. DOI:10.1038/s41597-023-02800-0

Licht, A., van Cappelle, M., Abels, H.A., et al., 2014. Asian monsoons in a late Eocene greenhouse world. Nature, 513: 501-506. DOI:10.1038/nature13704

Li, H., Durbin, R., 2011. Inference of human population history from individual whole-genome sequences. Nature, 475: 493-496. DOI:10.1038/nature10231

Li, X.Q., Xiang, X.G., Jabbour, F., et al., 2022. Biotic colonization of subtropical East Asian caves through time. Proc. Natl. Acad. Sci. U.S.A., 119: e2207199119. DOI:10.1073/pnas.2207199119

Li, M., Su, R.P., Cai, X., et al., 2025. Is it time to abandon the flow cytometry in the estimation of genome size when the K-mer analysis is available? A case from Engelhardia species. Genomics Commun., 2: e013.

Liang, Q., Lin, X., Liu, J., et al., 2022. Genome-wide identification of MAPKK and MAPKKK gene family members and transcriptional profiling analysis during bud dormancy in pear (Pyrus x bretschneideri). Plants, 11: 1731. DOI:10.3390/plants11131731

Lin, L., Jiang, X.L., Guo, K.Q., et al., 2023. Climate change impacts the distribution of Quercus section Cyclobalanopsis (Fagaceae), a keystone lineage in East Asian evergreen broadleaved forests. Plant Divers., 45: 552-568. DOI:10.1016/j.pld.2023.03.014

Liu, L., Jin, X., Chen, N., et al., 2015. Phylogeny of Morella rubra and its relatives (Myricaceae) and genetic resources of Chinese bayberry using RAD sequencing. PLoS One, 10: e0139840. DOI:10.1371/journal.pone.0139840

Liu, X., Zhang, H., Zhao, Y., et al., 2013. Auxin controls seed dormancy through stimulation of abscisic acid signaling by inducing ARF-mediated ABI3 activation in Arabidopsis. Proc. Natl. Acad. Sci. U.S.A., 110: 15485-15490. DOI:10.1073/pnas.1304651110

Love, M.I., Huber, W., Anders, S., 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol., 15: 550. DOI:10.1186/s13059-014-0550-8

Lynch, M., Conery, J.S., 2000. The evolutionary fate and consequences of duplicate genes. Science, 290: 1151-1155. DOI:10.1126/science.290.5494.1151

Ma, X., Ru, D., Morales-Briones, D.F., et al., 2023. Genome sequence and salinity adaptation of the desert shrub Nitraria sibirica (Nitrariaceae, Sapindales). DNA Res., 30: dsad011. DOI:10.1093/dnares/dsad011

Majoros, W.H., Pertea, M., Salzberg, S.L., 2004. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics, 20: 2878-2879. DOI:10.1093/bioinformatics/bth315

Mapleson, D., Accinelli, G.G., Kettleborough, G., et al., 2017. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics, 33: 574-576. DOI:10.1093/bioinformatics/btw663

Marcais, G., Kingsford, C., 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27: 764-770. DOI:10.1093/bioinformatics/btr011

Marques, A., Martins, I.S., Kastner, T., et al., 2019. Increasing impacts of land use on biodiversity and carbon sequestration driven by population and economic growth. Nat. Ecol. Evol., 3: 628-637. DOI:10.1038/s41559-019-0824-3

Meng, H.H., Gao, X.Y., Song, Y.G., et al., 2021. Biodiversity arks in the Anthropocene. Reg. Sustain., 2: 109-115.

Meng, H.H., Zhang, C.Y., Low, S.L., et al., 2022a. Two new species from Sulawesi and Borneo facilitate phylogeny and taxonomic revision of Engelhardia (Juglandaceae). Plant Divers., 44: 552-564. DOI:10.1016/j.pld.2022.08.003

Meng, H.H., Zhang, C.Y., Song, Y.G., et al., 2022b. Opening a door to the spatiotemporal history of plants from the tropical Indochina Peninsula to subtropical China. Mol. Phylogenet. Evol., 171: 107458. DOI:10.1016/j.ympev.2022.107458

Meng, H.H., Song, Y.G., 2023. Biogeographic patterns in Southeast Asia: Retrospectives and perspectives. Biodivers. Conserv., 31: 23261.

Meng, H.H., Song, Y.G., Hu, G.X., et al., 2025. Evolution of East Asian subtropical evergreen broad-leaved forests: When and how?. J. Syst. Evol.. DOI:10.111/jse.70001

Mickael, B., Creig, B.S., Sonja, S.Y., 2018. Flow cytometry as tool in plant sciences, with emphasis on genome size and ploidy level assessment. Genet. Appl., 2: 1-12.

Mirarab, S., Warnow, T., 2015. ASTRAL-Ⅱ: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics, 31: i44-i52. DOI:10.1093/bioinformatics/btv234

Mistry, J., Chuguransky, S., Williams, L., et al., 2021. Pfam: The protein families database in 2021. Nucleic Acids Res., 49: D412-D419. DOI:10.1093/nar/gkaa913

Mumm, R., Posthumus, M.A., Dicke, M., 2008. Significance of terpenoids in induced indirect plant defence against herbivorous arthropods. Plant Cell Environ., 31: 575-585. DOI:10.1111/j.1365-3040.2008.01783.x

Nguyen, L.T., Schmidt, H.A., von Haeseler, A., et al., 2015. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol., 32: 268-274. DOI:10.1093/molbev/msu300

Ou, S., Jiang, N., 2018. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol., 176: 1410-1422. DOI:10.1104/pp.17.01310

Pertea, G., Pertea, M., 2020. GFF utilities: GffRead and GffCompare. F1000Res, 9: 304. DOI:10.12688/f1000research.23297.1

Pertea, M., Pertea, G.M., Antonescu, C.M., et al., 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol., 33: 290-295. DOI:10.1038/nbt.3122

Potter, S.C., Luciani, A., Eddy, S.R., et al., 2018. HMMER web server: 2018 update. Nucleic Acids Res., 46: W200-W204. DOI:10.1093/nar/gky448

Qiao, X., Li, Q., Yin, H., et al., 2019. Gene duplication and evolution in recurring polyploidization-diploidization cycles in plants. Genome Biol., 20: 38. DOI:10.1186/s13059-019-1650-2

Qiao, X., Zhang, S., Paterson, A.H., 2022. Pervasive genome duplications across the plant tree of life and their links to major evolutionary innovations and transitions. Comput. Struct. Biotechnol. J., 20: 3248-3256. DOI:10.1016/j.csbj.2022.06.026

Qin, S.Y., Zuo, Z.Y., Guo, C., et al., 2023. Phylogenomic insights into the origin and evolutionary history of evergreen broadleaved forests in East Asia under Cenozoic climate change. Mol. Ecol., 32: 2850-2868. DOI:10.1111/mec.16904

Quesneville, H., Bergman, C.M., Andrieu, O., et al., 2005. Combined evidence annotation of transposable elements in genome sequences. PLoS Comput. Biol., 1: 166-175.

Ranallo-Benavidez, T.R., Jaron, K.S., Schatz, M.C., 2020. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun., 11: 1432. DOI:10.1038/s41467-020-14998-3

Razgour, O., Forester, B., Taggart, J.B., et al., 2019. Considering adaptive genetic variation in climate change vulnerability assessment reduces species range loss projections. Proc. Natl. Acad. Sci. U.S.A., 116: 10418-10423. DOI:10.1073/pnas.1820663116

Ren, R., Wang, H., Guo, C., et al., 2018. Widespread whole genome duplications contribute to genome complexity and species diversity in Angiosperms. Mol. Plant, 11: 414-428. DOI:10.1016/j.molp.2018.01.002

Roach, M.J., Schmidt, S.A., Borneman, A.R., 2018. Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics, 19: 460. DOI:10.1186/s12859-018-2485-7

Servant, N., Varoquaux, N., Lajoie, B.R., et al., 2015. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol., 16: 259. DOI:10.1186/s13059-015-0831-x

Siepielski, A.M., Morrissey, M.B., Buoro, M., et al., 2017. Precipitation drives global variation in natural selection. Science, 355: 959-962. DOI:10.1126/science.aag2773

Simao, F.A., Waterhouse, R.M., Ioannidis, P., et al., 2015. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31: 3210-3212. DOI:10.1093/bioinformatics/btv351

Soltis, P.S., Soltis, D.E., 2016. Ancient WGD events as drivers of key innovations in angiosperms. Curr. Opin. Plant Biol., 30: 159-165. DOI:10.1016/j.pbi.2016.03.015

Stanke, M., Diekhans, M., Baertsch, R., et al., 2008. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 24: 637-644. DOI:10.1093/bioinformatics/btn013

Sun, P., Jiao, B., Yang, Y., et al., 2022. WGDI: A user-friendly toolkit for evolutionary analyses of whole-genome duplications and ancestral karyotypes. Mol. Plant, 15: 1841-1851. DOI:10.1016/j.molp.2022.10.018

Sun, X.J., Wang, P.X., 2005. How old is the Asian monsoon system? Palaeobotanical records from China. Palaeogeogr. Palaeoclimatol. Palaeoecol., 222: 181-222. DOI:10.1016/j.palaeo.2005.03.005

Suyama, M., Torrents, D., Bork, P., 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res., 34: W609-W612. DOI:10.1093/nar/gkl315

Svenning, J.C., Eiserhardt, W.L., Normand, S., et al., 2015. The influence of paleoclimate on present-day patterns in biodiversitv and ecosystems. Annu. Rev. Ecol. Evol. Syst., 46: 551-572. DOI:10.1146/annurev-ecolsys-112414-054314

Tarailo-Graovac, M., Chen, N., 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter, 4: 4.10.1-4.10.14.

Travers, K.J., Patil, C.K., Wodicka, L., et al., 2000. Functional and genomic analyses reveal an essential coordination between the unfolded protein response and ER-associated degradation. Cell, 101: 249-258. DOI:10.1016/S0092-8674(00)80835-1

Vaghela, B., Vashi, R., Rajput, K., et al., 2022. Plant chitinases and their role in plant defense: A comprehensive review. Enzym. Microb. Technol., 159: 110055. DOI:10.1016/j.enzmictec.2022.110055

Vurture, G.W., Sedlazeck, F.J., Nattestad, M., et al., 2017. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics, 33: 2202-2204. DOI:10.1093/bioinformatics/btx153

Wan, Q., Huang, K., Chen, S.F., et al., 2023. Fagus diversification in China in relation to East Asian monsoon evolution. Quat. Sci. Rev., 320: 108350. DOI:10.1016/j.quascirev.2023.108350

Wang, D., Zhang, Y., Zhang, Z., et al., 2010. KaKs_Calculator 2.0: A toolkit incorporating gamma-series methods and sliding window strategies. Genom. Proteom. Bioinform., 8: 77-80. DOI:10.1016/s1672-0229(10)60008-3

Wang, Y., Tang, H., Debarry, J.D., et al., 2012. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res., 40: e49. DOI:10.1093/nar/gkr1293

Wang, M., Li, J., Wang, P., et al., 2021. Comparative genome analyses highlight transposon-mediated genome expansion and the evolutionary architecture of 3D genomic folding in cotton. Mol. Biol. Evol., 38: 3621-3636. DOI:10.1093/molbev/msab128

Wang, T.R., Ning, X., Zheng, S.S., et al., 2025. Genomic insights into ecological adaptation of oaks revealed by phylogenomic analysis of multiple species. Plant Divers., 47: 53-67. DOI:10.1016/j.pld.2024.07.008

Wang, X., Xu, M., Gao, C., et al., 2020. The roles of endomembrane trafficking in plant abiotic stress responses. J. Integr. Plant Biol., 62: 55-69. DOI:10.1111/jipb.12895

Wei, S.G., Li, L., Bai, K.D., et al., 2024. Community structure and species diversity dynamics of a subtropical evergreen broad-leaved forest in China: 2005 to 2020. Plant Divers., 46: 70-77. DOI:10.1016/j.pld.2023.07.005

Westerhold, T., Marwan, N., Drury, A.J., et al., 2020. An astronomically dated record of Earth's climate and its predictability over the last 66 million years. Science, 369: 1383-1387. DOI:10.1126/science.aba6853

Wicker, T., Sabot, F., Hua-Van, A., et al., 2007. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet., 8: 973-982. DOI:10.1038/nrg2165

Wu, F., Fang, X., Yang, Y., et al., 2022. Reorganization of Asian climate in relation to Tibetan Plateau uplift. Nat. Rev. Earth Environ., 3: 684-700. DOI:10.1038/s43017-022-00331-7

Xu, Q.H., Chen, F.H., Zhang, S.G., et al., 2017. Vegetation succession and East Asian summer monsoon changes since the last deglaciation inferred from high-resolution pollen record in Gonghai lake, Shanxi Province, China. Holocene, 27: 835-846. DOI:10.1177/0959683616675941

Xu, Z., Wang, H., 2007. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res., 35: 265-268.

Yang, Z., 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol., 24: 1586-1591. DOI:10.1093/molbev/msm088

Ye, J.W., Tian, B., Li, D.Z., 2022. Monsoon intensification in East Asia triggered the evolution of its flora. Front. Plant Sci., 13: 1046538. DOI:10.3389/fpls.2022.1046538

Zachos, J.C., Dickens, G.R., Zeebe, R.E., 2008. An early Cenozoic perspective on greenhouse warming and carbon-cycle dynamics. Nature, 451: 279-283. DOI:10.1038/nature06588

Zhang, C.Y., Low, S.L., Song, Y.G., et al., 2020. Shining a light on species delimitation in the tree genus Engelhardia Leschenault ex Blume (Juglandaceae). Mol. Phylogenet. Evol., 152: 106918. DOI:10.1016/j.ympev.2020.106918

Zhang, C.Y., Cao, G.L., Hu, J.L., et al., 2025. The reconstructed evolutionary history of the Engelhardia spicata complex highlights the impact of a three-tiered landform in the Indo-Burma ecoregion. Am. J. Bot., 112: e70077. DOI:10.1002/ajb2.70077

Zhang, Y., Duan, M.G., Huang, P.H., et al., 2023. Blinded by the bright—afforestation is affecting widespread sampling deficiency in plant collections. For. Ecol. Manag., 530: 120765. DOI:10.1016/j.foreco.2022.120765

Zhao, J., Li, S., Huang, J., et al., 2025. Heterogeneous occurrence of evergreen broad-leaved forests in East Asia: Evidence from plant fossils. Plant Divers., 47: 1-12. DOI:10.1016/j.pld.2024.07.004

Zulak, K.G., Bohlmann, J., 2010. Terpenoid biosynthesis and specialized vascular cells of conifer defense. J. Integr. Plant Biol., 52: 86-97. DOI:10.1111/j.1744-7909.2010.00910.x

Zvereva, E.L., Kozlov, M.V., 2014. Effects of herbivory on leaf life span in woody plants: a meta-analysis. J. Ecol., 102: 873-881. DOI:10.1111/1365-2745.12252

Article Information

Article history

Workspace