b. State Key Laboratory of Tropical Crop Breeding, Ministry of Agriculture and Rural Affairs Key Laboratory of Biology and Genetic Resources of Rubber Tree, State Key Laboratory Breeding Base of Cultivation and Physiology for Tropical Crops, Rubber Research Institute, Chinese Academy of Tropical Agricultural Sciences, Haikou 571101, China;
c. College of Tropical Agriculture and Forestry, Hainan University, Danzhou 571737, China;
d. Sanya Research Institute, Chinese Academy of Tropical Agricultural Sciences, Sanya 572025, China
Tamarindus indica L. is a perennial tree species native to tropical Africa and cultivated pantropically, including in Asia and South America (Khadivi et al., 2024; Singh et al., 2025). Its pulp is highly valued for its unique sour flavor, driven by high levels of organic acids (Amssayef et al., 2023). Tamarind pulp is integral to culinary traditions across Asia, Africa, and Latin America, where it is used in sauces, beverages, and confections (De Caluwé et al., 2010). Beyond its gastronomic role, tamarind pulp and seeds also possess significant nutritional and medicinal value, with reported antioxidant, anti-inflammatory, and other therapeutic properties (Martinello et al., 2017; Sookying et al., 2022; Toscano Oviedo et al., 2024). Despite the importance of tamarind to rural livelihoods, food industries, and nutraceutical markets, genomic studies of tamarind remain scarce, limiting progress in molecular breeding and genetic improvement.
Recent advances in long-read sequencing and assembly algorithms have revolutionized plant genomics, enabling telomere-to-telomere (T2T) assemblies in many non-model crops (Li et al., 2025; Lu et al., 2025; Wang et al., 2025). Within the Fabaceae, more than ten species have been sequenced, providing insights into genome evolution, whole-genome duplication (WGD), and trait diversification (Liu et al., 2025). However, genomic resources are largely concentrated in Papilionoideae (Soybean, Medicago, Lotus) and Caesalpinioideae (e.g., Senna, Chamaecrista), whereas Detarioideae, the subfamily to which tamarind belongs, remains underrepresented. Consequently, little is known about the evolutionary trajectory of Detarioideae genomes or how polyploidization has shaped their gene repertoires. A gap-free telomere-to-telomere genome of tamarind would thus fill a critical void in Fabaceae genomics and provide a unique resource for studying the evolution of organic acid metabolism in legumes.
Here, we report the first T2T genome assembly of Tamarindus indica, generated using PacBio HiFi long reads and Hi-C scaffolding. The final assembly spans 809.5 Mb across 12 pseudochromosomes, resolves all telomeric and centromeric regions, and exhibits high completeness (98.8% BUSCO). By presenting a gap-free T2T genome and multi-omics platform, our study fills a longstanding void in tamarind genomics, elucidates the evolutionary and regulatory basis of tartaric acid metabolism, and provides a foundation for molecular breeding and biotechnology. Beyond tamarind, this work also broadens our understanding of Fabaceae genome evolution and the diversification of fruit acidity in tropical tree crops.
Flow cytometry estimated the tamarind genome size to be 800 Mb. An initial assembly with PacBio HiFi reads yielded primary contigs totaling 820, 861, 317 bp, with an N50 of 64, 253, 138 bp (Tables S1 and S2). Subsequently, Hi-C data were used to anchor the contigs onto 12 pseudochromosomes (2n = 2x = 24; Fig. S1; Table S3 and S4). After gap filling, we identified all 24 telomeres and 12 centromeres across the 12 chromosomes (Tables S5 and S6). The final gap-free T2T genome assembly of Tamarindus indica spans 809, 508, 951 bp, with an N50 of 67, 459, 079 bp (Fig. S2; Tables S7 and S8). Assembly quality was evaluated using BUSCO, revealing a completeness of 98.8% across the genome (Table S9). A total of 30, 753 protein-coding genes were predicted through genome annotation (Table S10). Repetitive sequences comprised 65.66% (531, 508, 399 bp) of the assembled genome.
Phylogenetic relationships and WGD events. Phylogenetic analysis using single-copy orthologous protein sequences from Tamarindus indica and related legume species revealed four distinct subclades: Papilionoideae, Caesalpinioideae, Detarioideae, and Cercidoideae (Fig. 1b). The results indicated that T. indica shares the closest genetic relationship with Cercis canadensis, with the divergence between the two species estimated at ~56 million years ago (MYA). Ks peak analysis suggested that T. indica and C. canadensis underwent a shared WGD event prior to their divergence. Notably, T. indica experienced an additional lineage-specific WGD event following its divergence from C. canadensis (Figs. S3 and S4).
|
| Fig. 1 Overview of the Tamarindus indica genome and population analyses. a: The telomere-to-telomere genome assembly of T. indica. b: Phylogenetic tree of T. indica and related legume species. Numbers on branches indicate gene family expansions (red) and contractions (blue). Red stars denote whole-genome triplication events, and green stars denote whole-genome duplication events. Species abbreviations are defined as follows: P. coccineus, Phaseolus coccineus; P. vulgaris, Phaseolus vulgaris; P. acutifolius, Phaseolus acutifolius; P. lunatus, Phaseolus lunatus; G. soja, Glycine soja; G. max, Glycine max; L. culinaris, Lens culinaris; L. ervoides, Lens ervoides; M. truncatula, Medicago truncatula; C. arietinum, Cicer arietinum; L. japonicus, Lotus japonicus; A. hypogaea, Arachis hypogaea; L. albus, Lupinus albus; S. saman, Samanea saman; C. fasciculata, Chamaecrista fasciculata; T. indica, Tamarindus indica; C. canadensis, Cercis canadensis; A. thaliana, Arabidopsis thaliana. c: Longitudinal sections of T. indica fruits at three developmental stages: young fruit (Y), fruit swelling and ripening (S), and mature fruit (M). d: Accumulation patterns of key acidic metabolites during T. indica fruit development. l-Tartaric acid exhibits substantial accumulation at the mature stage, indicating its role as a major contributor to fruit acidity. e: The tartaric acid biosynthesis pathway in T. indica. Heatmaps show the expression levels of pathway genes across fruit developmental stages, with green stars highlighting genes derived from WGD. f: Weighted gene co-expression network analysis characterized the blue module, most strongly correlated with tartaric acid biosynthesis, with GRAS transcription factor as a hub gene. g: Phylogenetic tree of the 84 tamarind accessions, grouped into four distinct clades (T1–T4). h: Population structure plot generated by ADMIXTURE, with accession order consistent with the phylogenetic tree in (g). T1 represents the Hainan population; T2, T3, and T4 represent three Yunnan subpopulations. i: Overview of the Tamarind Multi-Omics Database (TIMDB) web interface. |
Tartaric acid biosynthesis and transcriptional regulation during fruit development. To elucidate the mechanism underlying the formation of sour-tasting metabolites during fruit development, we performed metabolomic profiling of Tamarindus indica across different developmental stages (Figs. 1c and S5a; Table S11). We identified 317 differentially accumulated metabolites (DAMs) that were shared across the three developmental stage comparisons (Fig. S5b and c and Table S12). A subset of these DAMs exhibited a significant increase in abundance during fruit development (Fig. S5d and e and Table S12). Further analysis revealed that tartaric acid accumulates at high levels in tamarind fruit and is the major contributor to its sour flavor (Fig. 1d). Tartaric acid is primarily synthesized via the ascorbic acid metabolism pathway (Burbidge et al., 2021). We identified 23 gene families involved in tartaric acid biosynthesis (including IdnDH, GME, and 2-KGR), comprising 77 genes in total (Fig. 1e and Table S13). Among them, IPS2 and GME2 were highly expressed during fruit development. We systematically characterized types of duplicated genes involved in the tartaric acid biosynthetic pathway. Approximately 57% of the pathway's genes were derived from WGD events (Table S14). Further analysis of the contributions of different WGD events revealed that the β-WGD played a key role in expanding this gene repertoire. Specifically, 28 of the 77 tartaric acid biosynthetic genes were derived from the β-WGD (Table S14). Using weighted gene co-expression network analysis, we identified a GRAS transcription factor as a hub gene in the blue module, which was most strongly associated with tartaric acid accumulation (Figs. 1f and S6a, b). Notably, the GRAS family has been implicated in fruit development (Liu et al., 2021; Neves et al., 2023).
Population genomics and genetic diversity. Using Illumina resequencing data of 84 tamarind accessions collected from Hainan and Yunnan, we identified 22, 333, 416 SNPs and 2, 596, 728 indels. A phylogenetic tree constructed from 922, 654 filtered SNPs revealed a clear divergence between the Hainan (T1) and Yunnan (T2–T4) populations. The Yunnan population was further divided into three subclades corresponding to their geographic origins (Fig. 1g). Population structure analysis using ADMIXTURE and PCA consistently supported the existence of four genetic clusters (T1–T4) (Fig. 1h and S7a). The Hainan population showed significantly faster linkage disequilibrium (LD) decay compared to the Yunnan groups, indicating distinct demographic histories (Fig. S7b). Nucleotide diversity (π) was relatively low across all populations (4.94 × 10−3 to 5.94 × 10−3), with the highest diversity in the Hainan (T1) population and the lowest in the Yunnan T3 population. Genetic differentiation was strongest between the Hainan (T1) and Yunnan subgroups (FST = 0.069–0.095), particularly between T1 and T3 (FST = 0.095). Within Yunnan, populations T2 and T3 showed the lowest differentiation (FST = 0.046). Other pairwise FST values ranged from 0.072 to 0.073, reflecting their geographic proximity despite some genetic distinction (Fig. S7c).
We present the Tamarind Multi-Omics Database (TIMDB) (Fig. 1i), a database that integrates high-quality genomic, transcriptomic, and metabolomic datasets to support multi-omics research and comparative genomic analyses (https://bioinformatics.hainanu.edu.cn/TIMDB/). The database provides interactive tools to query gene functions and sequence information, and download multi-omics data for applications in breeding, ecological adaptation, and medicinal studies (Fig. S8).
AcknowledgmentsThis work was supported by the National Natural Science Foundation of China (32172614), Hainan Province Science and Technology Special Fund (ZDYF2023XDNY050), Hainan Provincial Natural Science Foundation of China (324RC452).
Data availability
The raw sequencing data generated in this study have been deposited in the National Genomics Data Center (NGDC, https://ngdc.cncb.ac.cn/) under accession numbers bioproject: PRJCA030600, bioproject: PRJCA030512, and bioproject: PRJCA051051. The genome assembly and annotation files are available for download from the Tamarind Multi-Omics Database (TIMDB) at https://bioinformatics.hainanu.edu.cn/TIMDB/.
CRediT authorship contribution statement
Zhi-Dong Li: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Methodology, Formal analysis, Data curation. Sheng-Hao Wang: Visualization, Resources, Formal analysis, Data curation. Shu-Ling Wang: Writing – review & editing, Conceptualization, Supervision, Resources, Project administration. Chong Wang: Resources, Investigation, Formal analysis. Hong-Bin Zhang: Resources, Formal analysis. Fei Chen: Writing – review & editing. Wen-Quan Wang: Writing – review & editing, Funding acquisition, Conceptualization, Supervision, Resources, Investigation, Project administration.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.org/10.1016/j.pld.2025.12.011.
Amssayef, A., Bouadid, I., Eddouks, M., 2023. L-Tartaric acid exhibits antihypertensive and vasorelaxant effects: the possible role of eNOS/NO/cGMP pathways. Cardiovasc. Hematol. Agents Med. Chem., 21: 202-212. DOI:10.2174/1871525721666230111150501 |
Burbidge, C.A., Ford, C.M., Melino, V.J., et al., 2021. Biosynthesis and cellular functions of tartaric acid in grapevines. Front. Plant Sci., 12: 643024. DOI:10.3389/fpls.2021.643024 |
De Caluwé, E., Halamová, K., Van Damme, P., 2010. Tamarindus indica L.: a review of traditional uses, phytochemistry and pharmacology. Afr. Focus, 23: 53-83. DOI:10.21825/af.v23i1.5039 |
Khadivi, A., Mirheidari, F., Saeidifar, A., et al., 2024. Multivariate analysis of morphological variables in tamarind (Tamarindus indica L.). BMC Plant Biol., 24: 1154. DOI:10.1186/s12870-024-05872-1 |
Li, C.C., Yuan, Y., Nie, Z.Y., et al., 2025. The haplotype-resolved telomere-to-telomere genome and OMICS analyses reveal genetic responses to tapping in rubber tree. Nat. Commun., 16: 6255. DOI:10.1038/s41467-025-61527-1 |
Liu, N., Feng, W.J., Zhang, G.W., et al., 2025. Genomic and population evidence uncovers divergent improvement of vegetable soybean from grain soybean. Mol. Plant, 18: 1094-1097. DOI:10.1016/j.molp.2025.06.009 |
Liu, Y.D., Shi, Y., Su, D.D., et al., 2021. SlGRAS4 accelerates fruit ripening by regulating ethylene biosynthesis genes and SlMADS1 in tomato. Hortic. Res., 8: 3. DOI:10.1038/s41438-020-00431-9 |
Lu, J.Y., Wu, H.L., Wang, F., et al., 2025. Telomere to telomere flax (Linum usitatissimum L.) genome assembly unlocks insights beyond fatty acid metabolism pathways. Hortic. Res., 12: uhaf127. DOI:10.1093/hr/uhaf127 |
Martinello, F., Kannen, V., Franco, J.J., et al., 2017. Chemopreventive effects of a Tamarindus indica fruit extract against colon carcinogenesis depends on the dietary cholesterol levels in hamsters. Food Chem. Toxicol., 107: 261-269. DOI:10.1016/j.fct.2017.07.005 |
Neves, C., Ribeiro, B., Amaro, R., et al., 2023. Network of GRAS transcription factors in plant development, fruit ripening and stress responses. Hortic. Res., 10: uhad220. DOI:10.1093/hr/uhad220 |
Singh, A.K., Yadav, V., Rao, V.V.A., et al., 2025. Characterization and evaluation of tamarind (Tamarindus indica L.) germplasm: implications for tree improvement strategies. BMC Plant Biol., 25: 396. DOI:10.1186/s12870-025-06415-y |
Sookying, S., Duangjai, A., Saokaew, S., et al., 2022. Botanical aspects, phytochemicals, and toxicity of Tamarindus indica leaf and a systematic review of antioxidant capacities of T. indica leaf extracts. Front. Nutr., 9: 977015. DOI:10.3389/fnut.2022.977015 |
Toscano Oviedo, M.A., García Zapateiro, L.A., Quintana, S.E., 2024. Tropical fruits as a potential source for the recovery of bioactive compounds: tamarindus indica L., Annona muricata, Psidium guajava and Mangifera indica. J. Food Sci. Technol., 61: 2027-2035. DOI:10.1007/s13197-024-05983-5 |
Wang, F.Q., Jiang, Z.N., Gao, J.G., et al., 2025. Analysis of the Rehmannia chingii geneome identifies RcCYP72H7 as an epoxidase in iridoid glycoside biosynthesis. Nat. Commun., 16: 6035. DOI:10.1038/s41467-025-60909-9 |



