b. Key Laboratory of Palaeobiology and Petroleum Stratigraphy, Nanjing Institute of Geology and Palaeontology, Chinese Academy of Sciences, Nanjing 210008, China;
c. School of Earth Sciences, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, United Kingdom;
d. College of Marine Science and Biological Engineering, Qingdao University of Science and Technology, Qingdao 266042, China;
e. Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024-5192, USA;
f. Facultad de Ciencias Biológicas, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru;
g. Departamento de Entomología, Museo de Historia Natural, Universidad Nacional Mayor de San Marcos, Lima 15072, Peru
Angiosperms, which have experienced one of the most remarkable terrestrial radiations of all life forms, are a model group for understanding rapid diversification (e.g., Leebens-Mack et al., 2019; Zuntini et al., 2024). The vast majority of Angiosperms (ca. 99.95%) belong to the Mesangiospermae, a clade comprising five well-supported subclades: magnoliids, Chloranthales, monocots, Ceratophyllales (coontails), and eudicots (Soltis et al., 2019). However, due to their ancient rapid radiation (Moore et al., 2007; Li et al., 2021), the interrelationships between Mesangiospermae lineages remain intractable.
One reason that our understanding of the enigmatic relationships within Mesangiospermae has remained unclear is that we have long relied on plastid genomes and plastome-based phylogenomics (Gitzendanner et al., 2018a, 2018b; Li et al., 2021). Even with increased taxon sampling of easily obtained plastomes (Gitzendanner et al., 2018a, 2018b; Li et al., 2019, 2021; Yang et al., 2022), these data fail to resolve deep phylogenetic relationships. For instance, the latest version of the Angiosperm Phylogeny Group classification, APG IV (APG IV, 2016), has failed to resolve the systematic relationships between Chloranthales, magnoliids, and the monocots–Ceratophyllales–eudicots clade. More recently, a phylogenomic study based on almost 8000 angiosperm genera using a standardized set of 353 nuclear genes was unable to accurately place the Ceratophyllales (Zuntini et al., 2024).
One alternative to angiosperm plastid and mitochondrial genomes is nuclear genomic/transcriptomic data. These nuclear data, which have many more gene markers, hold enormous potential to better resolve the tricky nodes mired in rapid radiations. For instance, over the last decade, the One Thousand Plant Transcriptomes Initiative (1kp) has formulated a promising template for phylogenomic and evolutionary studies as well as providing a robust framework for understanding the evolution of green plants by extensively sequencing vegetative transcriptomes that span the diversity of plants (Wickett et al., 2014; Leebens-Mack et al., 2019). Moreover, the efficacy of expansive nuclear genome-scale datasets has been demonstrated in studies that have deciphered notoriously difficult-to-resolve relationships of green seaweeds, bryophytes, and gymnosperms (Stull et al., 2021; Su et al., 2021; Hou et al., 2022; Liu et al., 2022). Recent studies have attempted to understand the early rapid radiation of Mesangiospermae groups by sequencing high-quality genomes of comparatively species-poor lineages such as Ceratophyllales, Chloranthales, and magnoliids (e.g., Chaw et al., 2019; Chen et al., 2019; Hu et al., 2023; Yang et al., 2020b; Zhang et al., 2020; Guo et al., 2021; Ma et al., 2021; Qin et al., 2021). These genome resources of the key angiosperm lineages are invaluable for comparative explorations of the genic evolution that underpins morphological, physiological, and ecological diversification in early angiosperms. However, these data have failed to provide full resolution of the phylogenetic relationships of early diverging angiosperms, mainly because the effects of substitution models on phylogenomic analyses were not carefully evaluated (Liu et al., 2014; Guo et al., 2023).
In a recent study designed specifically for understanding mesangiosperm relationships, genome-scale datasets of up to 1594 protein-coding genes of 151 angiosperm taxa have been analyzed using both coalescent- and concatenation-based methods (Yang et al., 2020a). However, as in most recent large-scale phylogenomic studies, the substitution models for inferring either the gene tree (coalescence) or species tree (concatenation) were site-homogeneous ones (Zeng et al., 2014; Yang et al., 2020a; Zuntini et al., 2024) and were arbitrarily selected without proper model comparisons (Yang et al., 2020a; Zuntini et al., 2024). Even if model selection was used in a recent study (Hu et al., 2023), only site-homogeneous models were considered under the default settings; furthermore, the variation in substitution patterns among sites was not considered. Given the heterogeneous nature of molecular sequences of land plants, modeling among-site compositional heterogeneity (e.g., CAT, C20–C60 models) and reducing composition heterogeneity and saturation (e.g., Dayhoff-6 recoding of amino acid data) have been proven efficient and useful for reliably recovering relationships of the four major lineages of terrestrial plants (i.e., mosses, liverworts, hornworts, and vascular plants) and consistently supporting the monophyly of bryophytes (Puttick et al., 2018; de Sousa et al., 2019; Harris et al., 2020; Donoghue et al., 2021; Su et al., 2021). All these explorations highlight the significance of data curation and compositional modeling in current phylogenomic studies of land plants.
A fully resolved mesangiosperm relationship will greatly influence our understanding of the macroevolution of flowering plants and reconstruction of the ancestral flower and its mode of pollination in the floristic transition from Acrogymnosperms. Thus, we caution against relying exclusively on 'standard' phylogenomic analyses (one partition and site-homogeneous models selected arbitrarily or not; see Guo et al., 2021; Hu et al., 2023; Ma et al., 2021; Yang et al., 2020a), all of which failed to consider modeling among-site compositional heterogeneity and properly reduce saturation. Here we tackle this problem by analyzing comprehensive genome-scale datasets (Yang et al., 2020a) using models that allow for compositional heterogeneity among sites. Additionally, our results show that smaller datasets based on either plastomes (Li et al., 2019) or mitogenomes (Xue et al., 2022) are inherently insufficient for resolving the deeper phylogeny of mesangiosperms.
2. Material and methods 2.1. Dataset collationWe used the published supermatrices of the plastid genomes of Li et al. (2019) and the mitochondrial genomes of Xue et al. (2022) available from the Dryad Digital Repository (https://doi.org/10.5061/dryad.bq091cg and https://doi.org/10.5061/dryad.fj6q573rq , respectively), and the genomic/transcriptomic data of Yang et al. (2020a) deposited in the Figshare (https://figshare.com/s/27c41bba65a30dbfd3c7).
To focus on basal relationships of angiosperms and the five component clades of mesangiosperms, we generated two subsets of the complete plastid nucleotide (NT) dataset (2881 taxa, 82, 822 sites) of Li et al. (2019): one dataset contained 126 taxa, with all families of early diverging angiosperms included, and the other dataset (108 taxa) with fewer taxa of monocots than the first. We generated two datasets (NT and AA) of 113 mitochondrial genomes (38 genes; 38, 574 NT and 12, 858 AA sites) containing all major clades of angiosperms based on the data in Xue et al. (2022).
The nuclear genomic/transcriptomic data in our analyses were derived based on the three 153-taxon supermatrices assembled by Yang et al. (2020a), each with 1594, 756, or 296 genes. To reduce computational burden and balance taxon sampling of the five mesangiosperm clades, we sampled 53 representatives of the 1594-gene supermatrix and the resultant dataset was filtered using BMGE v.1.1 (Criscuolo and Gribaldo, 2010) (-m BLOSUM95, -h 0.4) to select slow-evolving regions that are less prone to saturation and informative for inferring deep phylogeny, resulting in Matrix 1 (M1: 53 taxa, 149, 781 AA sites). Similarly, for the 756-gene supermatrix, we subsampled 56 representative taxa to ensure even representation across the five focal lineages and filtered using BMGE (-m BLOSUM95, -h 0.4), yielding Matrix 2-1 (M2-1: 56 taxa, 267, 305 AA sites). To further reduce compositional heterogeneity (Laumer et al., 2019; Lozano-Fernandez, 2022), we applied the best known Dayhoff 6-state recoding strategy (Dayhoff et al., 1978) using PhyloBayes to Matrix 2-1, where chemically related amino acids are grouped into six categories and only inter-category substitutions are counted in the analysis (Matrix 2-1-recoded) (Kosiol et al., 2004). To test the impact of taxon sampling on tree inference, we further subsampled 29 (Matrix 2-2) and 19 (Matrix 2-3) representative species of the Matrix 2-1 (both with 267, 305 AA sites). For the 296-gene supermatrix (Matrix 3-original: 153 taxa, 114, 981 AA sites), we kept all 153 sampled species and filtered it using BMGE (-m BLOSUM95, -h 0.4), resulting in Matrix 3 (153 taxa, 109, 927 AA sites).
Considering the underlying codon structure of protein-coding nucleotide sequences, we also used MACSE v.2.06 (Ranwez et al., 2018) to align the selected 296 protein-coding genes. Each aligned gene at the AA level was trimmed using BMGE under the default setting (-m BLOSUM62, -h 0.5) before they were concatenated into Matrix 3-realigned (153 taxa, 113, 956 AA sites).
2.2. Phylogenetic analysesPhylogenomic analyses of the nuclear genome-scale (amino acid) datasets were conducted in IQ-TREE v. 2.1.3 (Minh et al., 2020) using the mixture models LG4X + R, LG + C20 + F + G, and the posterior mean site frequency (PMSF) approximation (Wang et al., 2018) of LG + C20 + F + G, LG + C40 + F + G and LG + C60 + F + G (see Supplementary Data for details on which models were used for which matrix). The PMSF profiles were computed based on the guide tree estimated under LG4X (Matrix 1, Matrix 2-3, and Matrix 3) or LG + C20 + F + G (Matrix 2-2). For Matrix 2-1, LG4X + R and CAT-GTR + G4 models were used. For Matrix 3-realigned, the best partition model selected by ModelFinder (Kalyaanamoorthy et al., 2017) in IQ-TREE was also tested. For each analysis, the ultrafast bootstrap support values were calculated from 1000 replicates (Hoang et al., 2018).
Phylogenetic analyses of the nuclear genome/transcriptome, plastome, and mitogenome datasets were also performed under the compositionally site-heterogeneous infinite mixture model CAT-GTR + G4 in PhyloBayes MPI 1.7a (Lartillot et al., 2013). Two independent chains were computed for each Bayesian analysis, with their convergence evaluated using the bpcomp program. Ideally the analyses were stopped when the maximum difference across bipartitions (maxdiff) was at least lower than 0.3 (achieved in the analyses on Matrix 2-1-recoded and Matrix 2-3 and the plastome and mitogenome matrices). Achieving convergence with the CAT-GTR + G4 model in PhyloBayes generally becomes challenging when the datasets have more than 20, 000 sites (even when using parallelization). In the cases where maxdiff remained higher than 0.3, the results were regarded as acceptable as long as the discrepancies between chains did not affect the mesangiosperm interrelationships (Lartillot, 2020).
Since the site-heterogeneous model (CAT-GTR) used in our focal analyses is exclusively implemented in PhyloBayes, we chose not to use other commonly used software in previous studies (e.g., RAxML and MrBayes) for analyzing our matrices. Additionally, we opted not to use the two-step coalescent method, which relies on gene trees as data and does not account for phylogenetic errors in gene tree estimates (see Discussion below).
2.3. Model comparisonModel selection in Bayesian phylogenetics is a challenge. For our large nuclear AA dataset, we used comparatively efficient and reliable approaches, i.e, the leave-one-out cross-validation (LOO-CV) and the widely applicable information criterion (wAIC, Watanabe (2009)), to estimate the relative fit of alternative models (CAT-GTR, LG, LG + C20, LG + C40, and LG + C60) in the latest PhyloBayes MPI 1.9 (Lartillot, 2023). Considering the huge computational burden, we performed model comparison based on Matrix 2-3 (19 taxa, 267, 305 AA sites). The LOO-CV and wAIC scores were compared to determine and select the best-fitting model used for phylogenomic analyses.
In the maximum likelihood framework (IQ-TREE), Bayesian information criterion (BIC) and corrected Akaike information criterion (AICc) were used as comparative tools to compare and select models for our phylogenetic analyses. Considering extremely heavy computational burden for the large dataset (Matrix 3-realigned), LG + I + G, LG4X, the partition model selected by ModelFinder (Kalyaanamoorthy et al., 2017) and LG + C20 + F + G were tested, but the more computationally expensive LG + C40 + F + G and LG + C60 + F + G (Bujaki and Rodrigue, 2022) were not.
2.4. Evaluating alternative hypotheses of mesangiosperm relationshipsTo evaluate support for the different hypotheses concerning the deeper interrelationships of mesangiosperms proposed by recent phylogenomic analyses based on nuclear genome/transcriptome data, we ran approximately unbiased (AU) tests (Shimodaira, 2002) in IQ-Tree using both partition (according to ModelFinder) and mixture models (LG + C20 + F + G, LG + C40 + F + G, and LG + C60 + F + G) for the comprehensive dataset Matrix 3-realigned.
3. Results 3.1. Analyses of plastomes from Li et al. (2019)Under the site-heterogeneous CAT-GTR + G4 model, the deeper nodes of mesangiosperms were not resolved, with the support (Baysian Posterior Probabilities, BPP) ranging from 0.26 to 0.61 in the 126-taxa dataset, and 0.43 to 0.81 in the 108-taxa dataset (Figs. S1A, B and S2). Interestingly, among the weakly supported relationships of the two independent analyses, Ceratophyllales were recovered as sister to monocots, with the highest support (BPP = 0.81).
As shown in Li et al. (2019) and other plastome-based analyses (Moore et al., 2007; Gitzendanner et al., 2018b; Soltis et al., 2019; Hu et al., 2023), the five component clades of mesangiosperms all have long subtending branches connected by exceptionally short internodes, which indicate rapid radiation of these clades. Therefore, even with better modeling that mitigates potential systematic error (long-branch attraction), a total of 80 plastid genes apparently lacks sufficient phylogenetic signal to robustly resolve the interrelationships of mesangiosperms.
3.2. Analyses of mitogenomes from ue et al. (2022)Under CAT-GTR + G4, the nucleotide dataset yielded a strongly supported tree of mesangiosperms, with Ceratophyllales sister to Chloranthales (BPP = 1), as recovered in the original analyses by Xue et al. (2022) (Figs. S1C and S3A). However, under the same site-heterogeneous model, the amino acid tree is much less supported (BPPs range from 0.44 to 0.75), especially for the deeper nodes of mesangiosperms (Figs. S1D and S3B). In particular, Amborellales (Amborella) were weakly supported as sister to Nymphaeales (BPP = 0.44) and the sister-group relationship of Ceratophyllales + Chloranthales was weakly supported (BPP = 0.75). As such, we argue that the 38-gene matrices were not able to reliably decipher the deep phylogeny of mesangiosperms.
3.3. Analyses of genome-scale data from Yang et al. (2020a)We generated eight matrices (Matrix 1, Matrix 2-1, Matrix 2-1-recoded, Matrix 2-2, Matrix 2-3, Matrix 3, Matrix 3-realigned, and Matrix 3-original) based on the genomic/transcriptomic data from Yang et al. (2020a), using different gene and taxon sampling, or different filtering or recoding strategies. Matrix 1 yielded an identical topology under the four-matrix mixture model LG4X + R and the finite profile mixture models (LG + C20 + F + G, LG + C40 + F + G and LG + C60 + F + G) in IQ-TREE (Fig. S4): Ceratophyllales were moderately to strongly supported as sister to monocots (BS = 74–98) and magnoliids were weakly to strongly supported as sister to Chloranthales (BS = 63–100) (Fig. 1A). Under the same set of mixture models, Matrix 2-1, Matrix 2-2, Matrix 3, and Matrix 3-realigned yielded a topology (Topology 1, Figs. 1B, S5, S6, S8 and S10) consistent with fig. 2A in Yang et al. (2020a), in which Ceratophyllales were recovered as sister to the remaining mesangiosperms (BS = 55–100). Two exceptions are that Matrix 3 and Matrix 3-realigned under the LG + C20 + F + G model yielded another topology (Figs. S8B and S11B): (Ceratophyllales, monocots), (magnoliids, (Chloranthales, eudicots)) (Topology 2), although the Ceratophyllales + monocots clade was only moderately supported (BS = 85 and 93, respectively). Matrix 3-realigned under the LG + C60 + F + G-PMSF model (LG + C20 tree as guide tree) yielded an identical topology (Fig. S12) to the one under LG + C20 + F + G, but slightly different from the CAT-GTR + G4 tree (Fig. S13) in the phylogenetic positions of two eudicot lineages. For Matrix 2-3, Topology 2 was recovered under all tested models (LG + I + G, LG4X, LG + C20 + F + G, LG + C40 + F + G and LG + C60 + F + G) (Fig. S6).
![]() |
Fig. 1 Phylogenomic analyses of nuclear genome-scale data from Yang et al. (2020a). (A) Topology 1 inferred from Matrix 1 under various modes in IQ-TREE. (B) Topology inferred from Matrix 2-1, Matrix 2-2 and Matrix 3 under various modes in IQ-TREE. (C) Maximally supported T3 inferred from Matrix 2-1-recoded (PhyloBayes runs converged), consistent with results based on all other matrices (in green box) under the CAT-GTR + G4 model. (D) Simplified tree of (C), summarizing the preferred backbone phylogeny of angiosperms. Cerat., Ceratophyllales; Chlor., Chloranthales. Support values for maximum likelihood analyses (A, B) correspond to the models in the red boxes, respectively. Green, monocots; purple, Ceratophyllales; brown, Chloranthales; light red, magnoliids; blue, eudicots. Plant silhouettes are from Phylopic. |
Under the infinite profile mixture model CAT-GTR + G4 in PhyloBayes, all eight matrices, regardless of whether filtered, Dayhoff-6 recoded or not, yielded a congruent topology (Topology 2, Figs. 1C, S4E, S5C, S6D, S7F and S9), and the interrelationships of the five clades were all maximally supported (BPP = 1 in all analyses). We first recovered the Ceratophyllales + monocots clade as sister to the remaining mesangiosperms. Magnoliids formed the sister group to Chloranthales + eudicots (Fig. 1D). All of our CAT-GTR + G4 analyses strongly supported (BPP = 1) an identical relationship of the early-diverging ANA grade: Amborella as the first-branching lineage, followed successively by Nymphaeales (water lilies) and then Austrobaileyales, which are sister to all other extant flowering plants (or Mesangiospermae) (Zhang et al., 2020; Guo et al., 2023). Although the monophyly of eudicots was strongly supported, some of the deeper nodes within this radiation (also characterized by exceedingly short internode branch lengths) were not robustly resolved in some analyses, which requires further phylogenomic interrogation and is beyond the scope of this study. Our results also showed that taxon sampling (19, 29, 53, 56 and 153 taxa) does not affect the tree topology under the CAT-GTR + G4 model.
3.4. Model comparisonThe leave-one-out cross validation (LOO-CV) scores and the widely applicable information criterion (wAIC) (Lartillot, 2023) obtained here based on Matrix 2-3 were quite close to each other, indicating that wAIC is a close approximation of LOO-CV for a large amino acid dataset. Based on the model fitness scores given in Table 1, the infinite profile mixture model CAT-GTR + G4 (PhyloBayes) was clearly better fit than the finite profile mixture models LG + C20, LG + C40, and LG + C60 and the site-homogeneous model LG (IQ-TREE) for the amino acid dataset, both according to LOO-CV (Δcv > 0.23) and according to wAIC (ΔwAIC > 0.23). The site-heterogenous models displayed distinctly better fit (Δcv/ΔwAIC > 0.15) than the site-homogeneous LG model, which has been commonly used in previous studies. Therefore, we selected the tree topology under the best fitting CAT-GTR + G4 model as our preferred tree for explaining the interrelationships of mesangiosperms.
Model type | Model | Criteria | |
LOO-CV | wAIC | ||
Site-heterogeneous | CAT-GTR | −10.8978 | −10.8924 |
LG + C60 | −11.1262 | −11.1262 | |
LG + C40 | −11.1429 | −11.1429 | |
LG + C20 | −11.1694 | −11.1694 | |
Site-homogeneous | LG | −11.3159 | −11.3159 |
In addition, in the maximum likelihood framework (IQ-TREE) for Matrix 3-realigned, LG + C20 + F + G was clearly better fitting than LG + I + G, LG4X + R, and the partition model selected by ModelFinder (Kalyaanamoorthy et al., 2017), according to both Bayesian information criterion (BIC) and corrected Akaike information criterion (AICc) scores (Fig. 2). This result is not unexpected, as we have also shown similar results in the Bayesian cross validation analysis.
![]() |
Fig. 2 Bayesian information criterion (BIC) and corrected Akaike information criterion (AICc) scores of Matrix 3-realigned under four tested models in IQ-TREE. According to both values, the site-heterogeneous LG + C20 + F + G model is the best-fitting of tested models. Topology 1 (T1): (Ceratophyllales, (monocots, (magnoliids, (-Chloranthales, eudicots)))); Topology 2 (T2, our preferred topology): ((Ceratophyllales, monocots), (magnoliids (Chloranthales, eudicots))). |
Topology tests accounting for across-site compositional heterogeneity (under better-fitting site-heterogeneous models) consistently supported the tree recovered under CAT-GTR + G4 and LG + C20 + F + G4 in some analyses: (Ceratophyllales, monocots), (magnoliids, (Chloranthales, eudicots)) (Topology 2, Table 2). The other rival topology (Topology 1), in which Ceratophyllales is sister to the remaining angiosperms, was only supported under the partition model, but significantly rejected under LG + C40 + F + G4 and LG + C60 + F + G4 models. As the partition model was worse-fitting than site-heterogeneous models according to the model comparison mentioned above, Topology 2 was the best supported tree of mesangiosperm interrelationships.
Topology | Refs. | Partition model | LG + C20 + F + G4 | LG + C40 + F + G4 | LG + C60 + F + G4 | |||||||
Δln L | PAU | Δln L | PAU | Δln L | PAU | Δln L | PAU | |||||
Topology 1: (Ceratophyllales, (monocots, (magnoliids, (Chloranthales, eudicots)))) | Yang et al. (2020a) (concatenation); Hu et al. (2023) (concatenation) | 0 | 0.969 | 186.42 | 0.054 | 215.93 | 0.0332∗ | 210.54 | 0.0284∗ | |||
Topology 2: ((Ceratophyllales, monocots), (magnoliids (Chloranthales, eudicots))) | Present study (preferred topology) | 301.12 | 0.0309∗ | 0 | 0.946 | 0 | 0.967 | 0 | 0.972 | |||
Topology 3: (monocots, (magnoliids, (Chloranthales, (Ceratophyllales, eudicots)))) | Yang et al. (2020a) (coalescent) | 834.18 | 0.000475∗ | 712.4 | 3.63e-47∗ | 699.52 | 9.9e-11∗ | 676.97 | 0.000412∗ | |||
Topology 4: (monocots, ((magnoliids, Chloranthales), (Ceratophyllales, eudicots))) | Guo et al. (2021); Ma et al. (2021); Hu et al. (2023) (coalescent) | 1242.1 | 1.04e-08∗ | 1023.5 | 3.15e-05∗ | 1014.7 | 1.96e-92∗ | 964.29 | 9.2e-36∗ |
Phylogenetic resolution of interrelationships of the five major clades of Mesangiospermae remains one of the most challenging problems in plant evolution (Davis et al., 2014; Gitzendanner et al., 2018b). More than a dozen topologies have been proposed for mesangiosperm interrelationships (Davis et al., 2014; Zeng et al., 2014; Gitzendanner et al., 2018b; Yang et al., 2020a; Guo et al., 2023; Hu et al., 2023; Zuntini et al., 2024). Unfortunately, recent large-scale phylogenomic studies based on more extensive sampling of both taxa and genes (Gitzendanner et al., 2018a; Li et al., 2019, 2021; Yang et al., 2020a, 2022; Guo et al., 2023; Hu et al., 2023) have failed to resolve relationships within Mesangiospermae. It is now widely accepted that even the whole plastid genomes of a broad sampling of angiosperms are not sufficient to unravel the interrelationships of mesangiosperms (Gitzendanner et al., 2018b; Yang et al., 2022), because the phylogenetic signal harbored in the circular plasmid and mitochondrial DNA may have eroded during the rapid diversification of mesangiosperms within just a few to some 27 million years (Moore et al., 2007; Yang et al., 2020a). Thus, relationships among Mesangiospermae based on plastomes remain essentially unresolved. The frequently discussed conflicts between the nuclear and chloroplast trees, as highlighted in studies by Guo et al. (2021) and Hu et al. (2023), are purported but lack substantial grounding. Stull et al. (2023) underscored the growing challenge of detecting ancient hybridization. Detecting a signal for past hybridization from the nuclear genome can diminish or become obscured over time, with well-supported cytonuclear discordance potentially serving as the most reliable evidence for ancient hybridization. However, when both nuclear and chloroplast data display uncertainty for deep angiosperm phylogeny, cytonuclear discordance loses its reliability as an indicator (Stull et al., 2023). Through our analyses, we showed compelling evidence that the chloroplast genomes alone are inherently inadequate for inferring deep phylogeny, rendering the detection of cytonuclear discordance unfeasible, despite a robustly supported tree based on nuclear genome-scale data.
In contrast with plastomic and mitochondrial data, nuclear transcriptomic and genomic data are becoming increasingly obtainable at a reasonable cost (Baker et al., 2022). The first large-scale phylotranscriptomic analysis of the land plants (Wickett et al., 2014), however, did not include one of the key mesangiosperm clades, Ceratophyllales. Interestingly, concatenation ML analysis of 674 genes revealed a topology congruent with our CAT-GTR + G4 trees. The coalescent-based tree, however, differed from the concatenation tree regarding the placement of Chloranthales, which was weakly supported as sister to magnoliids (Wickett et al., 2014). Such a variation in relationships inferred by different analytical methods has been suggested to stem from model misspecification (Wickett et al., 2014) or perhaps incomplete lineage sorting (ILS) (Wickett et al., 2014; Leebens-Mack et al., 2019; Yang et al., 2020a). The more recent broadly sampled phylotranscriptomic analysis of green plants, including all five clades, still failed to recover a consistent and robust tree of mesangiosperms (Leebens-Mack et al., 2019). As observed previously, the time intervals between the branching of the five mesangiosperm clades are exceedingly short (Leebens-Mack et al., 2019); consequently, using fewer genes (up to 410 genes) in exchange for broader sampling of green plants may have caused a phylogenomic artifact of the complexities of analyses across such a large and heterogeneous clade (Stull et al., 2021). All these attempts based on plastomic and transcriptomic data show that for the rapid radiation of mesangiosperms simply increasing taxon sampling is unlikely to recover a natural topology or increase support, but instead merely increase computational burden (Yang et al., 2020a). This is especially evident in the recent Angiosperms353-based study, where the placement of Ceratophyllales remained unresolved (Zuntini et al., 2024). As such, Yang et al. (2020a) assembled large phylogenomic datasets focusing on angiosperms (up to 1594 genes, 151 angiosperm taxa) to explore the phylogeny of mesangiosperms. However, as shown in Wickett et al. (2014), trees inferred from coalescent and concatenation-based methods are incongruent, especially in terms of the position of the monotypic Ceratophyllales (with a single extant genus Ceratophyllum) (Yang et al., 2020a). Based on the coalescent analysis, similar frequencies of gene-tree conflict were detected, a phenomenon explained by ILS (Yang et al., 2020a). However, two-step coalescent methods use gene trees as data and do not consider phylogenetic errors in gene-tree estimates. Even though low support branches have been collapsed to mitigate the conflict, data choice (nucleotide or amino acid) and model comparison (the GTR + G model for nucleotide was often used, skipping model testing/comparison) were arbitrary in recent phylogenomic studies (Leebens-Mack et al., 2019; Yang et al., 2020a; Zuntini et al., 2024). Such phylogenetic errors may cause two-step methods to underestimate internal branches in species trees and exaggerate the importance of ILS by inflating gene tree vs species tree discordance (Springer and Gatesy, 2014, Springer and Gatesy, 2016; Rannala et al., 2020). Moreover, for deep phylogenies of angiosperms, such as the relationships of Mesangiospermae, other complicating factors that influence analyses may also affect species-tree inference, including heterogeneity in the substitution process across sites and across lineages (Yang, 2014; Springer and Gatesy, 2016; Guo et al., 2023). Thus, observed variations in relationships inferred by different analytical methods may result from model misspecification. Because increased taxon sampling of mesangiosperms seems unlikely to improve the accuracy of tree inference (Hu et al., 2023), especially when the monotypic Ceratophyllales remains the confounding taxon (Yang et al., 2020a), problems with model misspecification may be resolved by utilizing better-fitting models that describe the natural properties of sequence evolution (Wickett et al., 2014).
Modeling among-site compositional heterogeneity (CAT-GTR + G4 model, Lartillot et al., 2013) has been proven useful for effectively resolving some of the most recalcitrant nodes in the tree of life (Feuda et al., 2017; Puttick et al., 2018; Marlétaz et al., 2019; Kapli et al., 2020; Williams et al., 2020; Tihelka et al., 2021; Cai et al., 2022; Cai, 2024). Unfortunately, the CAT-GTR + G4 model has rarely been formally applied and tested in recent phylogenomic studies of land plants (Wickett et al., 2014; Gitzendanner et al., 2018a; Leebens-Mack et al., 2019; Li et al., 2019, 2021; Yang et al., 2020a; Zuntini et al., 2024). By using the CAT-GTR + G4 model, Gnetales, a critical and morphologically peculiar lineage of gymnosperms, has been resolved as sister to Pinaceae (the Gnepine hypothesis, Li et al., 2017), a relationship later confirmed by other phylogenomic studies (Stull et al., 2021; Liu et al., 2022; Ran et al., 2018) (but see the coalescent tree in Leebens-Mack et al., 2019). Similarly, under the site-heterogeneous model, a recent reanalysis of the concatenation dataset from Wickett et al. (2014) recovered the monophyly of bryophytes (Puttick et al., 2018). Now that the bryophyte monophyly is currently consistently supported (de Sousa et al., 2019; Harris et al., 2020; Donoghue et al., 2021; Su et al., 2021), the implementation of the CAT-GTR + G4 model, which explicitly accommodates site-specific compositional heterogeneity, opens a new avenue for inferring phylogenetic relationships of rapid radiations.
Instead of using the more heterogeneous nucleotide data under the GTR + G model (Yang et al., 2020a), our analyses of the more conserved amino acid datasets comprising all representative angiosperm lineages explored the better-fitting CAT-GTR + G4 and LG + C20 + F + G models that specifically account for among-site compositional heterogeneity (Lartillot et al., 2013) and the Dayhoff-6 recoding strategy that reduces lineage-specific compositional heterogeneity (Foster et al., 2022; Giacomelli et al., 2022, but see Hernandez and Ryan, 2021). Modeling of amino acid replacement is central to phylogenomic inference, particularly so when dealing with deeper relationships and rapid radiations (Kapli et al., 2021; Lozano-Fernandez, 2022). As such, a careful model comparison is crucial in modern phylogenetic analyses (Lartillot, 2023), even though it is often skipped in most phylogenomic studies. As shown in our Bayesian cross-validation analyses to compare model fitness for angiosperm phylogeny (Table 1), a recent simulation study based on five datasets also demonstrates that amino acid mixture models outperform all simple models and free finite mixtures (CAT-GTR + G4) consistently outperform empirical finite mixtures (e.g. C60 and UDM256) (Bujaki and Rodrigue, 2022). As such, we suggest that our analyses under the CAT-GTR + G4 model recovered the most plausible tree of angiosperm evolution. Additionally, the AU test under the site-heterogenous models (LG + C40 + F + G and LG + C60 + F + G) in the maximum likelihood framework also strongly favored our preferred topology. The interrelationships of mesangiosperms are robustly and consistently resolved as follows: Ceratophyllales (coontails) are sister to monocots, and magnoliids sister to Chloranthales + eudicots (Fig. 1D).
In general, incongruence in phylogenetic reconstructions arising from various datasets can be attributed to two primary sources: biological and methodological factors. Biological sources encompass processes such as hybridization, ILS, and horizontal gene transfer, whereas methodological factors contain falsely assigned data or model violation. While the former offers intriguing insights into the evolutionary history of the studied groups, the latter should be mitigated to the greatest extent possible. It is crucial to eliminate or reduce errors introduced by methodology—accomplished, as shown here, by employing better-fitting models—before attributing the cause to biological sources (Fleming et al., 2023). Future directions of studies in molecular phylogenetics of plants should not only include broader sampling of taxa and genes, but also require exploration of the causes of phylogenomic discordances using different data and modeling across-site and also across-lineage compositional heterogeneity in plant evolution.
AcknowledgementsThe work has been supported by the National Natural Science Foundation of China (42222201, 42288201). We are grateful to Prof. Li-Mi Mao (Nanjing, China) for providing flower photos used in Fig. 1.
Data availability statement
The data sets and output files are available from https://doi.org/10.5061/dryad.bnzs7h4fq.
CRediT authorship contribution statement
Yongli Wang: Writing – review & editing, Writing – original draft, Visualization, Validation, Data curation, Conceptualization. Yan-Da Li: Writing – review & editing, Writing – original draft, Formal analysis, Data curation, Validation. Shuo Wang: Writing – review & editing, Validation. Erik Tihelka: Writing – review & editing, Writing – original draft, Visualization, Software. Michael S. Engel: Writing – review & editing, Validation. Chenyang Cai: Writing – review & editing, Visualization, Validation, Supervision, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization.
Declaration of competing interest
The authors have no conflict of interest.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2024.07.007.
Angiosperm Phylogeny Group (APG) IV, 2016. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc., 181: 1-20. DOI:10.1111/boj.12385 |
Baker, W.J., Bailey, P., Barber, V., et al., 2022. A comprehensive phylogenomic platform for exploring the angiosperm tree of life. Syst. Biol., 71: 301-319. DOI:10.1093/sysbio/syab035 |
Bujaki, T., Rodrigue, N., 2022. Bayesian cross-validation comparison of amino acid replacement models: contrasting profile mixtures, pairwise exchangeabilities, and gamma-distributed rates-across-sites. J. Mol. Evol., 90: 468-475. DOI:10.1007/s00239-022-10076-y |
Cai, C., 2024. Ant backbone phylogeny resolved by modelling compositional heterogeneity among sites in genomic data. Commun. Biol., 7: 106. DOI:10.1038/s42003-024-05793-7 |
Cai, C., Tihelka, E., Giacomelli, M., et al., 2022. Integrated phylogenomics and fossil data illuminate the evolution of beetles. Roy. Soc. Open Sci., 9: 211771. DOI:10.1098/rsos.211771 |
Chaw, S.M., Liu, Y.C., Wu, Y.W., et al., 2019. Stout camphor tree genome fills gaps in understanding of flowering plant genome evolution. Nat. Plants, 5: 63-73. DOI:10.1038/s41477-018-0337-0 |
Chen, J.H., Hao, Z.D., Guang, X.M., et al., 2019. Liriodendron genome sheds light on angiosperm phylogeny and species-pair differentiation. Nat. Plants, 5: 18-25. |
Criscuolo, A., Gribaldo, S., 2010. BMGE (Block Mapping and Gathering with Entropy): selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol. Biol., 10: 210. DOI:10.1186/1471-2148-10-210 |
Davis, C.C., Xi, Z., Mathews, S., 2014. Plastid phylogenomics and green plant phylogeny: almost full circle but not quite there. BMC Biology, 12: 11. DOI:10.1186/1741-7007-12-11 |
Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C., 1978. A model of evolutionary change in proteins. In: Dayhoff, M.O., Ech, R.V. (Eds.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Maryland, pp. 345–352.
|
de Sousa, F., Foster, P.G., Donoghue, P.C.J., et al., 2019. Nuclear protein phylogenies support the monophyly of the three bryophyte groups (Bryophyta Schimp.). New Phytol., 222: 565-575. DOI:10.1111/nph.15587 |
Donoghue, P.C.J., Harrison, C.J., Paps, J., et al., 2021. The evolutionary emergence of land plants. Curr. Biol., 31: R1281-R1298. DOI:10.1016/j.cub.2021.07.038 |
Feuda, R., Dohrmann, M., Walker, P., et al., 2017. Improved modelling of compositional heterogeneity supports sponges as sister to all other animals. Curr. Biol., 27: 3864-3870. DOI:10.1016/j.cub.2017.11.008 |
Fleming, J.F., Valero-Gracia, A., Struck, T.H., 2023. Identifying and addressing methodological incongruence in phylogenomics: a review. Evol. Appl., 16: 1087-1104. DOI:10.1111/eva.13565 |
Foster, P.G., Schrempf, D., Szöllősi, G.J., et al., 2022. Recoding amino acids to a reduced alphabet may increase or decrease phylogenetic accuracy. Syst. Biol., 72: 723-737. |
Giacomelli, M., Rossi, M.E., Lozano-Fernandez, J., et al., 2022. Resolving tricky nodes in the tree of life through amino acid recoding. iScience, 25: 105594. DOI:10.1016/j.isci.2022.105594 |
Gitzendanner, M.A., Soltis, P.S., Wong, G.K.S., et al., 2018a. Plastid phylogenomic analysis of green plants: a billion years of evolutionary history. Am. J. Bot., 105: 291-301. DOI:10.1002/ajb2.1048 |
Gitzendanner, M.A., Soltis, P.S., Yi, T.S., et al., 2018b. Plastome phylogenetics: 30 years of inferences into plant evolution. In: Shu, M.C., Robert, K.J. (Eds.), Advances in Botanical Research, vol. 85. Academic Press, Cambridge, pp. 293–313.
|
Guo, X., Fangm, D.M., Sahu, S.K., et al., 2021. Chloranthus genome provides insights into the early diversification of angiosperms. Nat. Commun., 12: 6930. DOI:10.1038/s41467-021-26922-4 |
Guo, C., Luo, Y., Gao, L.M., et al., 2023. Phylogenomics and the flowering plant tree of life. J. Integr. Plant Biol., 65: 299-323. DOI:10.1111/jipb.13415 |
Harris, B.J., Harrison, C.J., Hetherington, A.M., et al., 2020. Phylogenomic evidence for the monophyly of bryophytes and the reductive evolution of stomata. Curr. Biol., 30: 2001-2012. DOI:10.1002/rra.3730 |
Hernandez, A.M., Ryan, J.F., 2021. Six-state amino acid recoding is not an effective strategy to offset compositional heterogeneity and saturation in phylogenetic analyses. Syst. Biol., 70: 1200-1212. DOI:10.1093/sysbio/syab027 |
Hoang, D.T., Chernomor, O., von Haeseler, A., et al., 2018. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol., 35: 518-522. DOI:10.1093/molbev/msx281 |
Hou, Z., Ma, X.Y., Shi, X., et al., 2022. Phylotranscriptomic insights into a Mesoproterozoic–Neoproterozoic origin and early radiation of green seaweeds (Ulvophyceae). Nat. Commun., 13: 1610. DOI:10.1038/s41467-022-29282-9 |
Hu, H., Sun, P., Yang, Y., et al., 2023. Genome-scale angiosperm phylogenies based on nuclear, plastome, and mitochondrial datasets. J. Integr. Plant Biol., 65: 1479-1489. DOI:10.1111/jipb.13455 |
Kalyaanamoorthy, S., Minh, B.Q., Wong, T.K., et al., 2017. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods, 14: 587-589. DOI:10.1038/nmeth.4285 |
Kapli, P., Flouri, T., Telford, M.J., 2021. Systematic errors in phylogenetic trees. Curr. Biol., 31: R59-R64. DOI:10.1016/j.cub.2020.11.043 |
Kapli, P., Yang, Z., Telford, M.J., 2020. Phylogenetic tree building in the genomic age. Nat. Rev. Genet., 21: 428-444. DOI:10.1038/s41576-020-0233-0 |
Kosiol, C., Goldman, N., Buttimore, N.H., 2004. A new criterion and method for amino acid classification. J. Theor. Biol., 228: 97-106. DOI:10.1016/j.jtbi.2003.12.010 |
Lartillot, N., 2020. PhyloBayes: Bayesian phylogenetics using site-heterogeneous models. In: Scornavacca, C., Delsuc, F., Galtier, N. (Eds.), Phylogenetics in the Genomic Era, pp. 1.5: 1–1.5: 16. No Commercial Publisher, Authors Open Access.
|
Lartillot, N., 2023. Identifying the best approximating model in Bayesian phylogenetics:Bayes factors, cross-validation or wAIC?. Syst. Biol., 72: 616-638. DOI:10.1093/sysbio/syad004 |
Lartillot, N., Rodrigue, N., Stubbs, D., et al., 2013. PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst. Biol., 62: 611-615. DOI:10.1093/sysbio/syt022 |
Laumer, C.E., Fernández, R., Lemer, S., et al., 2019. Revisiting metazoan phylogeny with genomic sampling of all phyla. Proc. R. Soc. B-Biol. Sci., 286: 20190831. DOI:10.1098/rspb.2019.0831 |
Leebens-Mack, J.H., Barker, M.S., Carpenter, E.J., et al., 2019. One thousand plant transcriptomes and the phylogenomics of green plants. Nature, 574: 679-685. DOI:10.1038/s41586-019-1693-2 |
Li, H.T., Luo, Y., Gan, L., et al., 2021. Plastid phylogenomic insights into relationships of all flowering plant families. BMC Biology, 19: 232. DOI:10.1186/s12915-021-01166-2 |
Li, H.T., Yi, T.S., Gao, L.M., et al., 2019. Origin of angiosperms and the puzzle of the Jurassic gap. Nat. Plants, 5: 461-470. DOI:10.1038/s41477-019-0421-0 |
Li, Z., De La Torre, A.R., Sterck, L., et al., 2017. Single-copy genes as molecular markers for phylogenomic studies in seed plants. Genome Biol. Evol., 9: 1130-1147. DOI:10.1093/gbe/evx070 |
Liu, Y., Cox, C.J., Wang, W., et al., 2014. Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias. Syst. Biol., 63: 862-878. DOI:10.1093/sysbio/syu049 |
Liu, Y., Wang, S.B., Li, L.Z., et al., 2022. The Cycas genome and the early evolution of seed plants. Nat. Plants, 8: 389-401. DOI:10.1038/s41477-022-01129-7 |
Lozano-Fernandez, J., 2022. A practical guide to design and assess a phylogenomic study. Genome Biol. Evol., 14: evac129. DOI:10.1093/gbe/evac129 |
Ma, J.X., Sun, P.C., Wang, D.D., et al., 2021. The Chloranthus sessilifolius genome provides insight into early diversification of angiosperms. Nat. Commun., 12: 6929. DOI:10.1038/s41467-021-26931-3 |
Marlétaz, F., Peijnenburg, K.T.C.A., Goto, T., et al., 2019. A new spiralian phylogeny places the enigmatic arrow worms among gnathiferans. Curr. Biol., 29: 312-318. DOI:10.1016/j.cub.2018.11.042 |
Moore, M.J., Bell, C.D., Soltis, P.S., et al., 2007. Using plastid genome-scale data to resolve enigmatic relationships among basal angiosperms. Proc. Natl. Acad. Sci. U.S.A., 104: 19363-19368. DOI:10.1073/pnas.0708072104 |
Puttick, M.N., Morris, J.L., Williams, T.A., et al., 2018. The interrelationships of land plants and the nature of the ancestral embryophyte. Curr. Biol., 28: 733-745. DOI:10.1016/j.cub.2018.01.063 |
Qin, L.Y., Hu, Y.H., Wang, J.P., et al., 2021. Insights into angiosperm evolution, floral development and chemical biosynthesis from the Aristolochia fimbriata genome. Nat. Plants, 7: 1239-1253. DOI:10.1038/s41477-021-00990-2 |
Ran, J.H., Shen, T.T., Wang, M.M., et al., 2018. Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between Gnetales and angiosperms. Proc. R. Soc. B-Biol. Sci., 285: 20181012. DOI:10.1098/rspb.2018.1012 |
Rannala, B., Edwards, S.V., Leache, A., et al., 2020. The multi-species coalescent model and species tree inference. In: Scornavacca, C., Delsuc, F., Galtier, N. (Eds.), Phylogenetics in the Genomic Era, pp. 3.3: 1–3.3: 21. No Commercial Publisher, Authors Open Access.
|
Ranwez, V., Douzery, E.J., Cambon, C., et al., 2018. MACSE v2: toolkit for the alignment of coding sequences accounting for frameshifts and stop codons. Mol. Biol. Evol., 35: 2582-2584. DOI:10.1093/molbev/msy159 |
Shimodaira, H., 2002. An approximately unbiased test of phylogenetic tree selection. Syst. Biol., 51: 492-508. DOI:10.1080/10635150290069913 |
Soltis, P.S., Folk, R.A., Soltis, D.E., 2019. Darwin review: angiosperm phylogeny and evolutionary radiations. Proc. R. Soc. B-Biol. Sci., 286: 20190099. DOI:10.1098/rspb.2019.0099 |
Springer, M.S., Gatesy, J., 2014. Land plant origins and coalescence confusion. Trends Plant Sci., 19: 267-269. DOI:10.1016/j.tplants.2014.02.012 |
Springer, M.S., Gatesy, J., 2016. The gene tree delusion. Mol. Phylogenet. Evol., 94: 1-33. DOI:10.1016/j.ympev.2015.07.018 |
Stull, G.W., Qu, X.J., Parins-Fukuchi, C., et al., 2021. Gene duplications and phylogenomic conflict underlie major pulses of phenotypic evolution in gymnosperms. Nat. Plants, 7: 1015-1025. DOI:10.1038/s41477-021-00964-4 |
Stull, G.W., Pham, K.K., Soltis, P.S., et al., 2023. Deep reticulation: the long legacy of hybridization in vascular plant evolution. Plant J., 114: 743-766. DOI:10.1111/tpj.16142 |
Su, D.Y., Yang, L.X., Shi, X., et al., 2021. Large-scale phylogenomic analyses reveal the monophyly of bryophytes and Neoproterozoic origin of land plants. Mol. Biol. Evol., 38: 3332-3344. DOI:10.1093/molbev/msab106 |
Tihelka, E., Cai, C.Y., Giacomelli, M., et al., 2021. The evolution of insect biodiversity. Curr. Biol., 31: R1299-R1311. DOI:10.1016/j.cub.2021.08.057 |
Wang, H.C., Minh, B.Q., Susko, E., et al., 2018. Modelling site heterogeneity with posterior mean site frequency profiles accelerates accurate phylogenomic estimation. Syst. Biol., 67: 216-235. DOI:10.1093/sysbio/syx068 |
Watanabe S., 2009. Algebraic Geometry and Statistical Learning Theory. Cambridge Univ. Press.
|
Wickett, N.J., Mirarab, S., Nguyen, N., et al., 2014. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Natl. Acad. Sci. U.S.A., 111: E4859-E4868. |
Williams, T.A., Cox, C.J., Foster, P.G., et al., 2020. Phylogenomics provides robust support for a two-domains tree of life. Nat. Ecol. Evol., 4: 138-147. |
Xue, J.Y., Dong, S.S., Wang, M.Q., et al., 2022. Mitochondrial genes from 18 angiosperms fill sampling gaps for phylogenomic inferences of the early diversification of flowering plants. J. Syst. Evol., 60: 773-788. DOI:10.1111/jse.12708 |
Yang, L.X., Su, D.Y., Chang, X., et al., 2020a. Phylogenomic insights into deep phylogeny of angiosperms based on broad nuclear gene sampling. Plant Commun, 1: 100027. DOI:10.1016/j.xplc.2020.100027 |
Yang, Y.Z., Sun, P.C., Lv, L.K., et al., 2020b. Prickly waterlily and rigid hornwort genomes shed light on early angiosperm evolution. Nat. Plants, 6: 215-222. DOI:10.1038/s41477-020-0594-6 |
Yang, T., Sahu, S.K., Yang, L.X., et al., 2022. Comparative analyses of 3654 plastid genomes unravel insights into evolutionary dynamics and phylogenetic discordance of green plants. Front. Plant Sci., 13: 808156. DOI:10.3389/fpls.2022.808156 |
Yang, Z., 2014. Molecular Evolution: A Statistical Approach. Oxford Univ. Press, Oxford, England xv+492.
|
Zeng, L.P., Zhang, Q., Sun, R.R., et al., 2014. Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times. Nat. Commun., 5: 4956. DOI:10.1038/ncomms5956 |
Zhang, L., Chen, F., Zhang, X., et al., 2020. The water lily genome and the early evolution of flowering plants. Nature, 577: 79-84. DOI:10.1038/s41586-019-1852-5 |
Zuntini, A.R., Carruthers, T., Maurin, O., et al., 2024. Phylogenomics and the rise of the angiosperms. Nature, 629: 843-850. DOI:10.1038/s41586-024-07324-0 |