An Improved Chromosome-Level Genome Assembly and Annotation of Belted Beard Grunt (<i>Hapalogenys analis</i>)

Citation

GAO Tianxiang, WANG Yiting, SHI Huilai, et al. An Improved Chromosome-Level Genome Assembly and Annotation of Belted Beard Grunt (Hapalogenys analis)[J]. Journal of Ocean University of China, 2024, 23(4): 1026-1034.

Corresponding author

LIU Qi, E-mail: liuqi_agr@163.com.

History

Received June 30, 2023
revised September 22, 2023
accepted December 11, 2023

Contents Abstract Full text Figures/Tables PDF

An Improved Chromosome-Level Genome Assembly and Annotation of Belted Beard Grunt (Hapalogenys analis)

GAO Tianxiang¹⁾ , WANG Yiting¹⁾ , SHI Huilai²⁾ , PING Hongling²⁾ , LIU Qi³⁾ , and ZHANG Yang⁴⁾

1) Fishery College, Zhejiang Ocean University, Zhoushan 316022, China;
2) Zhejiang Province Key Laboratory of Mariculture and Enhancement, Zhejiang Marine Fisheries Research Institute, Zhoushan 316021, China;
3) Wuhan Onemore-Tech Co., Ltd., Wuhan 430076, China;
4) Shenzhen Institute of Quality & Safety Inspection and Research, Shenzhen 518000, China

Received June 30, 2023; revised September 22, 2023; accepted December 11, 2023

Corresponding author: LIU Qi, E-mail: liuqi_agr@163.com.

Abstract: Hapalogenys analis (order Lobotiformes) is an economically and ecologically significant fish species. It is a typical sedentary rocky reef fish and is primarily found in the northern Pacific Ocean. Here, we used Hi-C and PacBio sequencing technique to assemble a high-quality, chromosome-level genome for this species. The 539 Mb genome had a contig N50 with a size of 3.43 Mb, while 755 contigs clustered into 24 chromosomal groups with an anchoring rate of 99.02%. Of the total genomic sequence, 132.74 Mb (24.39%) were annotated as repeat elements. A total of 21360 protein-coding genes were identified, of which 20787 genes (97.32%) were successfully annotated to public databases. The BUSCO evaluation indicated that 96.90% of the total orthologous genes were matched. The phylogenetic tree representing H. analis and 14 other bony fish species indicated that the H. analis genome contained 364 expanded gene families related to olfactory receptor activity, compared with the common ancestor of H. analis and Sciaenidae. Comparative genomic analysis further identified 3584 contracted gene families. Branch-site modeling identified 277 genes experiencing positive selection, which may facilitate the adaptation to rocky reef environments. The genome reported here is helpful for ecological and evolutionary studies of H. analis.

Key words: Hapalogenys analis de novo assembly PacBio comparative genomics

1 Introduction

At least 20000 species of fish, one of the largest groups of vertebrate animals, colonize virtually in freshwater environments as well as tropical, temperate, and polar saltwater environments. Olfaction is critical to many animals, including fish, to facilitate migration, avoidance of predators, reproduction, and feeding (Laberge and Hara, 2001). For example, fish inhabiting in rocky reefs tend to be weak swimmers, but their strong sense of smell helps them to avoid predators (Wei et al., 2014). The recent availability of taxonomically diverse fish genomic sequences makes it possible to conduct evolutionary analyses of olfactory receptor (OR) repertoires across fish species. The majority of studies on the genetic basis of fish OR repertoires have focused on the identification and characterization of individual genes, leaving the evolutionary history and the roles of fish OR genes in ecological adaptation relatively unexplored.

The order Lobotiformes (class Actinopterygii) contains three families (Datnioididae, Hapalogenyidae, and Lobotidae) and several economically important species (Betancur et al., 2013; Betancur et al., 2017). Lobotiformes is primarily composed of rocky reef species with limited or no long-distance migration. Hapalogenys analis belongs to the family of Hapalogenyidae and is a typical rocky reef fish native to East Asia, inhabiting in the coastal waters of China, Japan, the Korean Peninsula, and some countries in Southeast Asia. H. analis has high economic values, with delicate meat containing a high content of amino acids, and is used in marine ranching and offshore cage aquaculture in Korea, Japan, and China (Xu et al., 2010). As a predator that is active at night, what factors help it feeding and odorant-oriented movement? Is it the olfactory system? Genomic analyses can provide relevant information to these questions. However, the only high-quality genomic sequence available for a member of Lobotiformes is of Datnioides undecimradiatus (GenBank assembly accession: GCA_008933995.1). This lack of relevant genomic resources hinders the continued study of the taxonomy, evolution, and biology of H. analis as well as other Lobotiformes.

In this research, a chromosome-level genome assembly of H. analis was studied utilizing Hi-C and PacBio longread sequencing data. The genome was assembled to provide insight into the evolutionary history of the gene families related to OR repertoires of H. analis and the mechanism by which this species adapts to rocky reef environments. Through comparative genomics, we identified the genetic basis of enhanced olfaction function and weak swimming capability in rocky reef fish. This genomic assembly will be crucial for studies on the ecology, biology, and evolutionary history of H. analis, as well as other species of Lobotiformes.

2 Materials and Methods 2.1 Sample Collection and Extraction of DNA

In September 2020, an adult female fish weighing approximately 650 g was sampled from Zhoushan (Zhejiang, China) fishery located in southeast of the Yangtze Estuary. DNA degradation was prevented by storing the collected muscle tissue in liquid nitrogen prior to DNA extraction. Genomic DNA was isolated from muscle tissues using Blood & Cell Culture DNA Mini Kit (QIAGEN, Cat. No. 13343), which is based on the phenol/chloroform method (Sambrook et al., 1989). Total RNA was extracted using TRIzol reagent (Invitrogen). The quality and concentration of extracted DNA/RNA were assessed using a Pultton DNA/Protein Analyzer (Plextech), and their integrity was further evaluated on 1% agarose gel stained with ethidium bromide (EB). The extracted DNA/RNA were stored at −80℃ for further analyses. All tissue collection and DNA/RNA extraction were conducted following the ethical regulations provided by the Institutional Animals Care and Use Committee of Zhejiang Ocean University. Only high-quality DNA was used for PacBio Sequel (for genome assembly), Illumina (for genome survey and genome correction), and Hi-C (for assisted genome assembly) sequencing. Other tissues, including the eyes, liver, and spleen, were utilized for RNA extraction. Fresh tissue was used for Hi-C sequencing and the rest of the samples were rapidly frozen in liquid nitrogen for 1 h and then stored in an ultra-low temperature refrigerator at −80℃. The resultant sequencing data were used for genome annotation.

2.2 Construction of the Library and Sequencing of the Genome

Genomic sequencing was carried out using the Illumina NovaSeq-6000 platform (Illumina, United States), and longread generation was carried out using the PacBio Sequel Ⅱ platform (Pacific Biosciences). The sequencing library generated by the Illumina platform were used for genome size estimation, genome assembly correction, and genome assembly evaluation. According to Illumina's standard procedure, a pair-end library was created utilizing a 300 bp insert size. The reads with more than 10% N bases or lowquality bases ≤ 5, adapter sequences, and duplicated sequences were removed. For the PacBio platform of long reads, a SMRTbell (Single molecule real-time, SMRT) library was established with a fragment size of 20 kb using the SMRTBell template preparation kit 1.0 (PacBio) following the manufacturer's instructions. The library was sequenced with the PacBio Sequel Ⅱ system in Circular Consensus Sequence (CCS) mode, and data from one SMRT cell were generated.

2.3 Assembly of the Genome and Estimation of Genome Size

Illumina short reads were subjected to K-mer analysis in order to estimate the repeat content, heterozygosity, and size of the H. analis genome (Liu et al., 2013). Genome size was calculated as follows:

$ G{\text{(enome)}} = \frac{{{\text{Number of }}K{\text{ - mers}}}}{{{\text{Average depth of }}K{\text{ - mers}}}}, $

where

$ {\rm{Number\;of\;}} K-{\rm{mers}} =\\ \;\;\;\;\;\; {\rm{Total}}\;K-{\rm{mer\;count–Abnormal\;}} K-{\rm{mer\;count}}, $

and depth of K-mers is peak depth. Canu was used to assemble PacBio long-reads, which were employed to polish the assembly data following removal of redundant data and haplotigs (Koren et al., 2017).

A two-step quality control process was adopted because of the errors intrinsic to long-reads. First, long-reads were used to correct the assembly with the GenomicConsensus (version 2.3.3) Arrow package, with minCoverage set to 15 (Chin et al., 2013). Second, the assembly was subjected to two rounds of quality control utilizing Illumina shortreads with PILON (version 1.2.3) (Walker et al., 2014). Following quality control, Purge_haplotigs was utilized to remove redundancy, according to sequence similarity and read depth (Roach et al., 2018).

2.4 Chromosome-Level Genome Assembly with Hi-C Library

The method of Kajitani et al. (2014) was used to assemble a chromosome-level genome utilizing a Hi-C library. An Illumina Novaseq 6000 platform was used to sequence the library, and bowtie was used to map the reads to the quality-controlled genome, using standard settings (Langmead, 2010). Both read ends were aligned independently, and only read pairs exhibiting unique alignment at each end were selected for further analysis. Lachesis (version 1.03) was used to assemble the chromosome-level genome with corrected contigs and Hi-C reads, using standard settings (Burton et al., 2013). Juicebox (version 1.5) was used to manually adjust the positions and directions of small contigs, according to the degree of contig interaction (Durand et al., 2016).

Two methods were employed to evaluate the consistency and accuracy of the assembled chromosome-level genome. First, Blasr (version 5.3.3) was used to map the longreads to the genome (Chaisson and Tesler, 2012), and BWA-MEM (version 0.7.10) was used to map the short-reads (Li et al., 2010), using default settings. Second, genomic assembly completeness was estimated using Benchmarking Universal Single Copy Orthologues (BUSCO version 2.0), based on single copy homologs (Simão et al., 2015).

2.5 Gene Model Prediction and Functional and Repeat Annotation

The repeat elements were annotated by homology-based and de novo prediction methods. The homolog prediction was conducted with Repbase database (https://www.girinst.org/repbase/) (Jurka et al., 2005). Repeat Protein Mask (version 4.1.0) and RepeatMasker (version 4.0.7) were utilized to extract repeat regions (Tarailo-Graovac and Chen, 2009). Tandem repeats were extracted using Tandem Repeat Finder (version 4.09) (Benson, 1999) by ab initio prediction. LTR_FINDER (version 1.0.2) (Xu and Wang, 2007), Repeat Modeler (version 2.0) (Flynn et al., 2020), and Repeat Scout (version 1.0.6) (Price et al., 2005) were used for de novo identification of transposable elements (TEs). All TEs and repeats were combined into a repeat library to further verify the annotation results of repeat sequences.

Gene structure prediction used a combination of de novo prediction, homology-based prediction, and transcriptomebased strategy. Both GENSCAN (Burge and Karlin, 1997) and AUGUSTUS (version 2.7) (Stanke et al., 2006) were used for de novo predictions. For homology-based annotation, the complete protein sequences of five related species were downloaded from NCBI (the National Center for Biotechnology Information), including Acanthochromis polyacanthus, Anabas testudineus, Astyanax mexicanus, Amphiprion ocellaris, and Astatotilapia calliptera. TBLASTN (E value ≤ 1e − 5) was used to compare the genome sequence of H. analis with them. GENEWISE (version 2.4.0) was used to predict gene model according to the corresponding gene region on each alignment (Birney et al., 2004), and the intron/exon structure was defined (Doerks et al., 2002). Thirdly, transcriptome data were aligned to the assembled genome sequences using TopHat (version 2.0.11) to identify exon regions and splice positions (Kim et al., 2013). The alignment results were then used as input for Cufflinks (version 2.2.1) for genome-based transcript assembly (Ghosh and Chan, 2016). The gene sets predicted by various methods were integrated into a complete gene set using HiCESAP (Wuhan Gooalgene Co., Ltd., https://www.gooalgene.com/) and Maker (version 2.31.10) (Cantarel et al., 2008). Functional annotation of predicted genes was carried out using Blastx/n (E value ≤ 1e − 5), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa and Goto, 2000), TrEMBL (Boeckmann et al., 2003), InterPro, SwissProt, and the NCBI non-redundant database (NR). Blast2GO (version 5.2.5) was utilized to perform Gene Ontology (GO) annotation (Conesa et al., 2005).

Four kinds of non-coding RNA were annotated, including miRNA, tRNA, rRNA and snRNA. Micro-RNAs and ribosomal RNAs were identified by searching the Rfam database using the Infernal (version 1.1) (Griffiths-Jones et al., 2005; Nawrocki and Eddy, 2013). TRNASCAN-SE (version 1.3.1) was used to predict transfer RNAs (Lowe and Eddy, 1997), and rRNAs were predicted using BLASTN.

2.6 Selection Analysis and Genome Evolution Analysis

OrthoMCL (verison v2.0.9) was used to detect orthologous groups by aligning the protein sequences of Liparis tanakae, Anarrhichthys ocellatus, Pseudochaenichthys georgianus, Trematomus bernacchii, Notothenia coriiceps, Etheostoma spectabile, Sebastes schlegelii, Sebastes umbrosus, Epinephelus lanceolatus, Larimichthys crocea, Argyrosomus japonicus, Collichthys lucidus, Danio rerio, and Lepisosteus oculatus. MUSCLE (version 3.8.31) was used to multiply-align the single-copy orthologs shared between all 15 species, using default parameters (Edgar, 2004). Raxml (version 8.2.12) was used to perform multiple-sequence alignment and phylogenetic tree construction (Stamatakis, 2014). Divergence time was evaluated with r8s (version 1.7) and the MCMC Tree package of PAML (Yang, 2007). Calibration was carried out using the time of divergence between L. oculatus and D. rerio (245.1–335.6 Mya), which was acquired from TimeTree (http://www.timetree.org/) (Kumar et al., 2017). Gene contraction and expansion were evaluated using CAFÉ (version 4.0) (De Bie et al., 2006). The contracted and expanded gene families in H. analis, and the contracted gene families shared between five sedentary species (H. analis, A. ocellatus, S. schlegelii, S. umbrosus, and E. lanceolatus) were found to be enriched in several GO terms and KEGG pathways, and the Benjamini and Hochberg FDR correction was applied. Using the KEGG and GO annotations of H. analis as background, the significantly overrepresented KEGG pathways and GO terms were identified (corrected P value ≤ 0.05). H. analis was used as foreground branch, and other species for which phylogenetic trees were constructed were used as background branches. Positively selected genes (PSGs) in the H. analis genome were detected using single-copy orthologous genes. The branch-site model and the likelihood ratio comparison test in the CodeML program of the PAML software were used to test whether individual codon sites in the gene sequences were selective (determined by calculating the site dN/dS values, i.e., non-synonymous substitution rate/synonymous substitution rate), and the possible positively selected genes obtained were subjected to GO and KEGG enrichment analyses.

3 Results and Discussion 3.1 Genome Assembly and Assessment

Illumina Novaseq sequencing was conducted using a 300 bp library, resulting in the generation of 83.02 Gb of sequence data. According to the K-mer depth and number, we inferred that the genome was approximately 550 Mb in size. The genome size was then corrected to 543 Mb through removal of incorrect K-mers resulting from sequencing errors. The heterozygosity rate was 0.45% and the repeat rate was 26.78%. Approximately 120.67 Gb (about 222×) of PacBio sequencing data was obtained utilizing the Pac-Bio Sequel Ⅱ platform. The final genomic sequences were 544.26 Mb in length, consistent with the K-mer-based estimate. The genome contained 755 contigs, with contig N50 and scaffold N50 values of 3.43 Mb and 23.53 Mb, respectively. This genome was compared to the Actinopterygii database, and BUSCO analysis indicated that 97.27% (4459/4584; 94.20% for single-copy and complete BUSCO and 3.08% for duplicated and complete BUSCO) of the complete BUSCO genes were present in the assembled genome.

Hi-C scaffolding was used to anchor and orient the genomic contigs to each chromosome (Fig. 1). 60.235 Gb of clean data were generated by the Hi-C library, of which 54.34% was validly paired. Using Lachesis, we anchored 99.02% of the assembled sequences to 24 chromosomes, with lengths ranging between 10.36 Mb to 28.90 Mb (Fig. 2). The 24 chromosomes were clearly distinguished in the Hi-C heatmap, and the strong internal interactions indicated the high quality of anchoring. The final assembled genome was 539.00 Mb in length, with a scaffold N50 of 23.53 Mb and a contig N50 of 3.43 Mb (Table 1).

Fig. 1 Hi-C heatmap of H. analis genome.

Fig. 2 Chromosome-level assembly.

Table 1 Statistics related to the Hi-C assembly

3.2 Gene Model Prediction and Functional and Repeat Annotation

A total of 132.74 Mb of repeat sequences were annotated, which accounted for 24.39% of the total genome. The predominant repeat elements were DNA transposons (12.63%), long terminal repeats (LTR, 5.40%), and long interspersed element (LINE, 4.79%). A combination of RNA-seq based method, homolog-based method, and de novo method was used to predict the presence of 21360 protein-coding genes. The average number of exons, exon length, intron length, CDS length, and gene length were 10.44, 237.66 bp, 1126 bp, 1704 bp, and 13105 bp, respectively. The predicted gene model statistics exhibited similar patterns across A. polyacanthus, A. testudineus, A. mexicanus, A. ocellaris, and A. calliptera (Fig. 3). The BUSCO evaluation of predicted genes indicated that 96.90% of the total orthologous genes were matched (93.10% of singlecopy and complete BUSCO, and 3.80% of duplicated and complete BUSCO). A total of 20787 genes, which accounted for 97.32% of all genes, were annotated within public databases. We also successfully annotated 511 miRNAs, 2904 tRNAs, 1897 rRNAs, and 753 snRNAs, with average lengths of 85, 75, 122, and 146 bp, respectively. The complete assembly and annotation results were visualized in a circus plot (Fig. 4).

Fig. 3 Comparison of gene model statistics.

Fig. 4 Circos plot of H. analis genome. From the outer to the inner circle: a, GC content; b, gene distribution; c, repeats; d, long tandem repeats (LTR); e, long interspersed nuclear elements (LINE); f, DNA TE. Bar height is proportional to the number of items mapped to statistical interval within the genome.

3.3 Comparative Genomics

A comparative genomic assessment between H. analis and 14 other species (five sedentary fish and seven migrating fish) revealed that the H. analis genome contained 3584 contracted and 364 expanded gene families, compared to the most recent common ancestor of H. analis and Sciaenidae (L. crocea, C. lucidus) (Fig. 5). Further analysis indicated that gene families under expansion were significantly enriched in 4 KEGG pathways and 36 GO terms associated with olfaction, including olfactory receptor activity (corrected P-value = 8.43E − 11, GO: 0004984), G protein-coupled receptor activity (corrected P-value = 1.04E − 05, GO: 0004930), transmembrane signaling receptor activity (corrected P-value = 5.38E − 05, GO: 0004888), olfactory transduction (corrected P-value = 8.55E − 11, ko04740), and necroptosis (corrected P-value = 8.91E − 07, ko04217). Olfaction is crucial to an array of animal behaviors, including predator avoidance, mate selection, and feeding (Hara, 1975; Su et al., 2009; Bazáes et al., 2013; Hughes et al., 2018). Through regulation of the OR gene family, animals can detect and discriminate between a wide variety of odiferous compounds. In both fish and mammals, OR gene expression is the highest in olfactory epithelium (OE) sensory neurons, which are located within the nasal cavity (Vassar et al., 1993; Churcher et al., 2015; van der Linden et al., 2018; Cong et al., 2019). Odorant discrimination and, by extension, environmental adaptation are driven largely by the diversity of OR genes. Among placental mammals in particular, there exists considerable variation in the number of intact OR genes, with most species hosting a large number of OR pseudogenes (Niimura et al., 2014). This variation in the OR repertoire is primarily driven by frequent evolutionary losses and gains of genes through pseudogenization and duplication events (Vandewege et al., 2016; Hughes et al., 2018; Niimura et al., 2018).

Fig. 5 Gene families experiencing contraction or expansion in H. analis and 13 other bony fish species. Each branch is annotated with the number of significantly contracted (−, red) or expanded (+, green) gene families. Blue numbers indicate the estimated species divergence time (million years ago). Red dots represent divergences used for time recalibration.

The H. analis genome was found to contain contracted gene families related to motor function, including myosin complex (corrected P-value = 1.48E − 28, GO: 0016459), actin cytoskeleton (corrected P-value = 3.02E − 24, GO: 0015629), cytoskeletal motor activity (corrected P-value = 9.13E − 23, GO: 0003774), carbohydrate derivative binding GO terms (corrected P-value = 5.83E − 10, GO: 0097 367), vascular smooth muscle contraction (corrected P-value = 1.01E − 10, ko04270), cardiac muscle contraction (corrected P-value = 1.35E − 8, ko04260), thyroid hormone signaling pathways (corrected P-value = 1.93E − 7, ko04919), and arachidonic acid metabolism pathways (corrected P-value = 3.31E − 4, ko00590). Notably, the five sedentary species (A. ocellatus, E. lanceolatus, H. analis, S. schlegelii, and S. umbrosus) shared 19 contracted gene families associated with cellular aromatic compound metabolic process (corrected P-value = 0.0E + 00, GO: 0006725), nitrogen compound metabolic process (corrected P-value = 0.0E + 00, GO: 0006807), metabolic process (corrected P-value = 0.0E + 00, GO: 0008152), biosynthetic process (corrected P-value = 0.0E + 00, GO: 0009058), and primary metabolic process (corrected P-value = 0.0E + 00, GO: 0044238). These functions are critical for regulating energy metabolism, and there is a correlation between lower metabolic rates and lower DNA mutation rates. Specifically, the DNA mutation rate is related to environmental energy (Davies et al., 2004), metabolic rate (Martin and Palumbi, 1993), and life history characteristics (Bromham et al., 1996). Certain contracted gene families, for instance neutrophil extracellular trap formation (corrected P-value = 0.0E + 00, ko04613), may reflect the reduced energy requirements and locomotor activity of sedentary fish compared with migratory fish. Sun et al. (2011) found that the protein-coding mitochondrial genes of migratory fishes are under greater selective pressure than non-migratory fishes. In addition, several studies have shown that a lower metabolic rate may increase the tolerance of functional genes to mutations (Child, 1939; Zhu et al., 2020). For example, Cassidy et al. (2019) found that Drosophila can slow down development by reducing metabolic rate. This process inhibits gene expression disorder resulted from the deletion of miRNA-activating genes or the downregulation of miRNA expression, and allows the fly to develop a normal phenotype. This phenomenon can explain why sedentary fish develop normally despite accumulating more mutations than migratory fish.

A branch-site model identified a total of 277 genes undergoing positive selection (positive selection genes, PSGs). These PSGs were found to be significantly enriched in five KEGG pathways and 316 GO terms, including DNA binding (corrected P-value = 1.07E − 02, GO: 0003677), gene expression (corrected P-value = 1.82E − 02, GO: 0010467), response to stress (corrected P-value = 2.52E − 02, GO: 00 06950), DNA repair (corrected P-value = 2.52E − 02, GO: 0006281), and homologous recombination (corrected P-value = 8.20E − 03, ko: 03440). These functions are associated with macromolecule biosynthesis and environmental adaptation.

4 Conclusions

We utilized both Hi-C data and PacBio long reads to assemble the 539 Mb chromosome-level genome of H. analis, which represents the first reference genome of the family Hapalogenyidae. Owing to the intrinsically long Pac-Bio sequencing, the assembled genome had a scaffold N50 of 23.53 Mb and a contig N50 of 3.43 Mb. The assembled genome and genome annotation will be beneficial to the further study of Lobotiformes species, as well as to an array of breeding, conservation, and phylogenetic studies of the family Hapalogenyidae. Comparative genomic analysis revealed that OR gene families were expanded, while motor capacity and energy metabolism gene families were contracted in H. analis and other sedentary reef fishes. These results also help explain how H. analis adaptes to rocky reef environments.

Data Availability Statement

The raw sequencing data, including RNA-seq [SRA accession: SRR16077758], Illumina short reads [SRA accession: SRR16002978], Hi-C [SRA accession: SRR16002979], and PacBio data [SRA accession: SRR16002979], are available in the NCBI Sequence Read Archive (SRA) database, and can be accessed with the project ID PRJNA765047. Please contact the corresponding author for more annotation information.

Acknowledgements

This work was supported by the Province Key Research and Development Program of Zhejiang (No. 2021C02047), the Special Projects of Zhejiang Provincial Science and Technology Department (Nos. HYS-CZ-004, HYS-CZ-20 2208), and the 'San Nong Jiu Fang' Science and Technology Cooperation Project of Zhejiang Province (No. 2022 SN JF073).

References

Bazáes, A., Olivares, J., and Schmachtenberg, O., 2013. Properties, projections, and tuning of teleost olfactory receptor neurons. Journal of Chemical Ecology, 39(4): 451-464. DOI:10.1007/s10886-013-0268-1 (

Benson, G., 1999. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Research, 27(2): 573-580. DOI:10.1093/nar/27.2.573 (

Betancur, R. R., Broughton, R. E., Wiley, E. O., Carpenter, K., López, J. A., Li, C., et al., 2013. The tree of life and a new classification of bony fishes. PLoS Currents Tree of Life, 5(1): e1001550. (

Betancur, R. R., Wiley, E. O., Arratia, G., Acero, A., Bailly, N., Miya, M., et al., 2017. Phylogenetic classification of bony fishes. BMC Ecology and Evolution, 17(1): 162. (

Birney, E., Clamp, M. E., and Durbin, R., 2004. GeneWise and Genomewise. Genome Research, 14(5): 988-995. DOI:10.1101/gr.1865504 (

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M. C., Estreicher, A., Gasteiger, E., et al., 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Research, 31: 365-370. DOI:10.1093/nar/gkg095 (

Bromham, L., Rambaut, A., and Harvey, P. H., 1996. Determinants of rate variation in mammalian DNA sequence evolution. Journal of Molecular Evolution, 43(6): 610-621. DOI:10.1007/BF02202109 (

Burge, C., and Karlin, S., 1997. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Evolution, 268(1): 78-94. (

Burton, J. N., Adey, A. C., Patwardhan, R., Qiu, R., Kitzman, J. O., and Shendure, J. A., 2013. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nature Biotechnology, 31: 1119-1125. DOI:10.1038/nbt.2727 (

Cantarel, B. L., Korf, I., Robb, S. M., Parra, G., Ross, E., Moore, B., et al., 2008. MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research, 18(1): 188-196. DOI:10.1101/gr.6743907 (

Cassidy, J. J., Bernasek, S., Bakker, R. A., Giri, R., Peláez, N., Eder, B., et al., 2019. Repressive gene regulation synchronizes development with cellular metabolism. Cell, 178: 980-992.e17. DOI:10.1016/j.cell.2019.06.023 (

Chaisson, M. J. P., and Tesler, G., 2012. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): Application and theory. BMC Bioinformatics, 13: 238. DOI:10.1186/1471-2105-13-238 (

Child, G. P., 1939. The effect of increasing time of development at constant temperature on the wing size of vestigial of Drosophila melanogaster. The Biological Bulletin, 77: 432-442. DOI:10.2307/1537653 (

Chin, C., Alexander, D. H., Marks, P. J., Klammer, A. A., Drake, J. P., Heiner, C. R., et al., 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods, 10: 563-569. DOI:10.1038/nmeth.2474 (

Churcher, A. M., Hubbard, P. C., Marques, J. P., Canário, A. V., and Huertas, M., 2015. Deep sequencing of the olfactory epithelium reveals specific chemosensory receptors are expressed at sexual maturity in the European eel Anguilla anguilla. Molecular Ecology, 24(4): 822-834. DOI:10.1111/mec.13065 (

Conesa, A., Götz, S., García-Gómez, J. M., Terol, J., Talón, M., and Robles, M., 2005. Blast2GO: A universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 21(18): 3674-3676. DOI:10.1093/bioinformatics/bti610 (

Cong, X. J., Zheng, Q., Ren, W. W., Chéron, J. B., Fiorucci, S., Wen, T. Q., et al., 2019. Zebrafish olfactory receptors ORAs differentially detect bile acids and bile salts. Journal of Biological Chemistry, 294(17): 6762-6771. DOI:10.1074/jbc.RA118.006483 (

Davies, T. J., Savolainen, V., Chase, M. W., Moat, J., and Barraclough, T. G., 2004. Environmental energy and evolutionary rates in flowering plants. Proceedings of the Royal Society B: Biological Sciences, 271(1553): 2195-2200. DOI:10.1098/rspb.2004.2849 (

De Bie, T., Cristianini, N., Demuth, J. P., and Hahn, M. W., 2006. CAFE: A computational tool for the study of gene family evolution. Bioinformatics, 22(10): 1269-1271. DOI:10.1093/bioinformatics/btl097 (

Doerks, T., Copley, R. R., Schultz, J., Ponting, C. P., and Bork, P., 2002. Systematic identification of novel protein domain families associated with nuclear functions. Genome Research, 12: 47-56. DOI:10.1101/gr.203201 (

Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S. P., Huntley, M. H., Lander, E. S., et al., 2016. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Systems, 3: 95-98. DOI:10.1016/j.cels.2016.07.002 (

Edgar, R. C., 2004. MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5): 1792-1797. DOI:10.1093/nar/gkh340 (

Edgar, R. C., and Myers, E. W., 2005. PILER: Identification and classification of genomic repeats. Bioinformatics, 21: i152-i158. DOI:10.1093/bioinformatics/bti1003 (

Flynn, J. M., Hubley, R., Goubert, C., Rosen, J., Clark, A. G., Feschotte, C., et al., 2020. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences of the United States of America, 117(17): 9451-9457. (

Ghosh, S., and Chan, C. K., 2016. Analysis of RNA-Seq data using tophat and cufflinks. Methods in Molecular Biology, 1374: 339-361. (

Griffiths-Jones, S., Moxon, S., Marshall, M., Khanna, A., Eddy, S. R., and Bateman, A., 2005. Rfam: Annotating non-coding RNAs in complete genomes. Nucleic Acids Research, 33 (Database issue): D121-D124. (

Griffiths-Jones, S., Saini, H. K., van Dongen, S., and Enright, A. J., 2008. miRBase: Tools for microRNA genomics. Nucleic Acids Research, 36 (Database issue): D154-D158. (

Hamdani E. H., and Døving, K. B., 2007. The functional organization of the fish olfactory system. Progress in Neurobiology, 82(2): 80-86. DOI:10.1016/j.pneurobio.2007.02.007 (

Hara, T. J., 1975. Olfaction in fish. Progress in Neurobiology, 5(4): 271-335. (

Hughes, G. M., Boston, E. S. M., Finarelli, J. A., Murphy, W. J., Higgins, D. G., and Teeling, E. C., 2018. The birth and death of olfactory receptor gene families in mammalian niche adaptation. Molecular Biology and Evolution, 35(6): 1390-1406. DOI:10.1093/molbev/msy028 (

Jurka, J., Kapitonov, V. V., Pavlícek, A., Klonowski, P., Kohany, O., and Walichiewicz, J., 2005. Repbase update, a database of eukaryotic repetitive elements. Cytogenetic and Genome Research, 110: 462-467. DOI:10.1159/000084979 (

Kajitani, R., Toshimoto, K., Noguchi, H., Toyoda, A., Ogura, Y., Okuno, M., et al., 2014. Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads. Genome Research, 24: 1384-1395. DOI:10.1101/gr.170720.113 (

Kanehisa, M., and Goto, S., 2000. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Research, 28(1): 27-30. DOI:10.1093/nar/28.1.27 (

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S. L., 2013. TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biology, 14(4): R36. DOI:10.1186/gb-2013-14-4-r36 (

Koren, S., Walenz, B. P., Berlin, K., Miller, J. R., Bergman, N. H., and Phillippy, A. M., 2017. Canu: Scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research, 27(5): 722-736. DOI:10.1101/gr.215087.116 (

Kumar, S., Stecher, G., Suleski, M., and Hedges, S. B., 2017. TimeTree: A resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution, 34(7): 1812-1819. DOI:10.1093/molbev/msx116 (

Laberge, F., and Hara, T. J., 2001. Neurobiology of fish olfaction: A review. Brain Research Reviews, 36(1): 46-59. DOI:10.1016/S0165-0173(01)00064-9 (

Langmead, B., 2010. Aligning short sequencing reads with bowtie. Current Protocols in Bioinformatics, 32: 11.7.1-11.7.14. (

Liu, B. H., Shi, Y. J., Yuan, J. Y., Hu, X. S., Zhang, H., Li, N., et al., 2013. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv: 1308.2012. (

Lowe, T. M., and Eddy, S. R., 1997. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Research, 25(5): 955-964. DOI:10.1093/nar/25.5.955 (

Manni, M., Berkeley, M. R., Seppey, M., and Zdobnov, E. M., 2021. BUSCO: Assessing genomic data quality and beyond. Current Protocols, 1(12): e323. DOI:10.1002/cpz1.323 (

Martin, A. P., and Palumbi, S. R., 1993. Body size, metabolic rate, generation time, and the molecular clock. Proceedings of the National Academy of Sciences of the United States of America, 90(9): 4087-4091. (

Nawrocki, E. P., and Eddy, S. R., 2013. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics, 29: 2933-2935. DOI:10.1093/bioinformatics/btt509 (

Niimura, Y., Matsui, A., and Touhara, K., 2014. Extreme expansion of the olfactory receptor gene repertoire in African elephants and evolutionary dynamics of orthologous gene groups in 13 placental mammals. Genome Research, 24: 1485-1496. DOI:10.1101/gr.169532.113 (

Niimura, Y., Matsui, A., and Touhara, K., 2018. Acceleration of olfactory receptor gene loss in primate evolution: Possible link to anatomical change in sensory systems and dietary transition. Molecular Biology and Evolution, 35(6): 1437-1450. DOI:10.1093/molbev/msy042 (

Pertea, M., Kim, D., Pertea, G. M., Leek, J. T., and Salzberg, S. L., 2016. Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nature Protocols, 11(9): 1650-1667. DOI:10.1038/nprot.2016.095 (

Pertea, M., Pertea, G., Antonescu, C., Chang, T. C., Mendell, J. T., and Salzberg, S. L., 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature Biotechnology, 33: 290-295. DOI:10.1038/nbt.3122 (

Price, A. L., Jones, N. C., and Pevzner, P. A., 2005. De novo identification of repeat families in large genomes. Bioinformatics, 21: i351-i358. DOI:10.1093/bioinformatics/bti1018 (

Roach, M. J., Schmidt, S. A., and Borneman, A. R., 2018. Purge Haplotigs: Allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinformatics, 19(1): 460. DOI:10.1186/s12859-018-2485-7 (

Sambrook, J., Fritsch, E., and Maniatis, T., 1989. Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, New York, 9-55. (

Sanderson, M. J., 2003. r8s: Inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics, 19(2): 301-302. DOI:10.1093/bioinformatics/19.2.301 (

Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V., and Zdobnov, E. M., 2015. BUSCO: Assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics, 31(19): 3210-3212. DOI:10.1093/bioinformatics/btv351 (

Stamatakis, A., 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9): 1312-1313. DOI:10.1093/bioinformatics/btu033 (

Stanke, M., Keller, O., Gunduz, I., Hayes, A., Waack, S., and Morgenstern, B., 2006. AUGUSTUS: Ab initio prediction of alternative transcripts. Nucleic Acids Research, 34 (Web Server issue): W435-W439. (

Su, C. Y., Menuz, K., and Carlson, J. R., 2009. Olfactory perception: Receptors, cells, and circuits. Cell, 139(1): 45-59. DOI:10.1016/j.cell.2009.09.015 (

Sun, Y. B., Shen, Y. Y., Irwin, D. M., and Zhang, Y. P., 2011. Evaluating the roles of energetic functional constraints on teleost mitochondrial-encoded protein evolution. Molecular Biology and Evolution, 28(1): 39-44. DOI:10.1093/molbev/msq256 (

Tarailo-Graovac, M., and Chen, N., 2009. Using RepeatMasker to identify repetitive elements in genomic sequences. Current Protocols in Bioinformatics, 25: 4.10.11-14.10.14. (

van der Linden, C., Jakob, S., Gupta, P., Dulac, C., and Santoro, S. W., 2018. Sex separation induces differences in the olfactory sensory receptor repertoires of male and female mice. Nature Communications, 9(1): 5081. DOI:10.1038/s41467-018-07120-1 (

Vandewege, M. W., Mangum, S. F., Gabaldón, T., Castoe, T. A., Ray, D. A., and Hoffmann, F. G., 2016. Contrasting patterns of evolutionary diversification in the olfactory repertoires of reptile and bird genomes. Genome Biology and Evolution, 8(3): 470-480. (

Vassar, R., Ngai, J., and Axel, R., 1993. Spatial segregation of odorant receptor expression in the mammalian olfactory epithelium. Cell, 74(2): 309-318. DOI:10.1016/0092-8674(93)90422-M (

Walker, B. J., Abeel, T., Shea, T. P., Priest, M. E., Abouelliel, A., Sakthikumar, S., et al., 2014. Pilon: An integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One, 9(11): e112963. DOI:10.1371/journal.pone.0112963 (

Wei, T., Sun, Y. N., Zhang, B., Wang, R. X., and Xu, T. J., 2014. A mitogenomic perspective on the phylogenetic position of the Hapalogenys genus (Acanthopterygii: Perciformes) and the evolutionary origin of Perciformes. PLoS One, 9(7): e103011. DOI:10.1371/journal.pone.0103011 (

Xu, T. J., Wang, J. X., Sun, Y. N., Shi, G., and Wang, R. X., 2010. Phylogeny of Hapalogenys with discussion on its systematic position in Percoidea using cytochrome b gene sequences. Acta Zoologica Sinica, 35: 530-536. (

Xu, Z., and Wang, H., 2007. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Research, 35 (Web Server issue): W265-W268. (

Yang, Z., 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution, 24(8): 1586-1591. DOI:10.1093/molbev/msm088 (

Zhu, J., Lou, Y., Shi, Q. S., Zhang, S., Zhou, W. T., Yang, J., et al., 2020. Slowing development restores the fertility of thermo-sensitive male-sterile plant lines. Nature Plants, 6: 360-367. DOI:10.1038/s41477-020-0622-6 (

收稿日期：2023-06-30；修订日期：2023-09-22；接受日期：2023-12-11