浙江大学学报 (农业与生命科学版)  2017, Vol. 43 Issue (2): 164-152
文章快速检索     高级检索
Association analysis revealed importance of dominance effects on days to silk of maize nested association mapping (NAM) population[PDF全文]
MONIR Md Mamun, Jun ZHU    
Institute of Bioinformatics, Zhejiang University, Hangzhou 310058, China
Summary: Full model and multi-loci additive model were used to analyze the days to silk (DS, female flowering) of maize nested association mapping (NAM) population. Analysis with the full model revealed that small effects of additive, dominance, epistasis, and their environmental interactions of many loci controlled the DS of maize NAM population. Dominance related effects had large impacts on the trait. Estimated total heritability was 79.86%, whereas 50.52% was due to dominance related effects. Environmental specific genetic effects also revealed as imperative for DS, explained 27.31% phenotypic variations. The highly significant (-log10PEW>5) quantitative trait SNPs (QTSs) identified were 50 for full model, but 47 for additive model with low heritability (31.65%). Utilizing the association analysis results of DS, genotypes and total genetic effects of superior lines, superior hybrids were predicted that could be useful for future breeding program.
Keyword: genome-wide association study    maize    days to silk    dominance effects    
关联分析揭示显性效应对玉米巢式定位群体抽穗期的重要性
马姆·茂尼, 朱军    
浙江大学生物信息学研究所, 杭州 310058
摘要: 采用关联定位全模型和多位点加性模型, 分析了玉米巢式关联定位群体抽丝期的遗传效应.全模型关联分析揭示, 玉米抽丝期受微效多基因的加性、显性、上位性及其环境互作控制, 其中显性效应最为重要.在估算的总遗传率 (79.86%) 中, 与显性效应相关的遗传率高达50.52%, 其次是环境互作效应的遗传率 (27.31%).检测到的极显著 (-log10 PEW>5) 数量性状单核苷酸多态性位点数为全模型50个、加性模型47个 (遗传率=31.65%).基于关联分析玉米抽丝期的结果, 预测了最优自交系和最优杂交组合的基因型组配方式及相应的遗传效应值, 可用于指导玉米群体优异位点的精准分子选择.
关键词: 全基因组关联分析    玉米    抽丝期    显性效应    

Flowering time is an important trait, measuring the adaption capability of plants to local environments[1-2]. The transition from vegetative growth to flowering by integrating different environmental prompts is crucial for plant reproductive success[3]. Flowering time is considered as a major selection criterion in plant breeding[4]. Maize is originated from Balsas teosinte (Zea mays ssp. parviglumis) in the Mexican highlands (approximately 9 000 years ago), and has evolved to adapt in diverse ecological conditions[1]. Dissection of the genetic mechanisms of maize flowering time is crucial for evolutionary analysis and future breeding programs. Several studies have been conducted to discover the underlying genetic architecture of flowering time of maize by using quantitative trait locus (QTL) mapping and genome-wide association study (GWAS)[1-2, 5].

Dominance and epistasis are important phenomena in quantitative genetics area. Complexity of the genetic architecture can be largely attributed to epistasis, which plays a significant role in heterosis, inbreeding depression, adaptation, reproductive isolation, and speciation[6]. However, most of the GWAS of different organisms have been analyzed by ignoring the impacts of dominance, epistasis and environmental interaction. Ignoring the important factors could be a major cause of missing heritability of GWAS. Heterozygous genotypes are generally found with high proportion in random mating and others specially designed populations. However, in whole genome sequencing data with a large number of single nucleotide polymorphisms (SNPs), a small portion of heterozygote genotypes can be found in inbred lines of animals and crops that could have large impacts on phenotypic traits[7-8]. In this study, an attempt was made to discover the impacts of heterozygous genotypes on days to silk (DS) of maize nested association mapping (NAM) population. For that, the full model approach with additive, dominance, epistasis, and their environmental interactions was analyzed to dissect genetic architecture of DS by using QTXNetwork[9]. Maize NAM population was constructed by only five-generation self-crossing within 25 diverse families[1, 5, 10]. However, there were no heterozygous genotypes rather than a small portion of missing genotypes. The missing genotypes were replaced by heterozygote genotypes in this study. An additive model with only additive (a) and additive by environmental interactions (ae) was also analyzed for comparison study. Genotypes and total genetic effects of best line (BL), superior line (SL), and superior hybrid (SH) were arranged to observe the scope of improvements for future maize breeding.

1 Materials and methods 1.1 Genotype and phenotype data

Maize nested association mapping (NAM) population derived in the United States (US-NAM) was used in this study, which was derived by crossing 25 diverse lines with B73 and then self-pollination for five generations[5, 10]. Days to silk (DS) were scored over nine environments. However, to get rid from computational complexity, data from four environments were analyzed. We downloaded the genotype and phenotype data sets from http://www.panzea.org/.

1.2 Statistical analysis

Newly developed approach for association mapping, implemented in QTXNetwork, was used for association mapping. The approach has two distinct parts: generalized multi-factor dimensionality reduction (GMDR) method to scan SNPs by 1D for main effects, 2D and 3D for epistasis interactions using module GMDR-GPU[11] of QTXNetwork, and then association mapping was conducted on detected SNPs by using quantitative traits SNPs (QTS) module of QTXNetwork. Two different models for association mapping were used in this study, called full genetic model and multiloci additive model. The full genetic model includes SNP loci effects (a, d, aa, ad, da, dd) as fixed; environment (e) and loci by environment interaction (ae, de, aae, ade, dae, dde) as random effects for four environments (1 for E1, 2 for E2, 3 for E4, and 4 for E9). The statistical approaches of full and additive models[12] were used for conducting association analyses.

Henderson method Ⅲ [13] was used to calculate the F-statistic test for association analysis. A total of 2 000 times permutation was conducted for calculating the critical F-value to control the experiment-wise type Ⅰ error (αEW < 0.05). Parameters were estimated by using the MCMC (Markov chain Monte Carlo) algorithm with 20 000 Gibbs sample iterations[9, 14-16]. Experiment-wise critical P value (PEW-value) was calculated by controlling experiment-wise typeⅠerror (PEW < 0.05).

2 Results 2.1 Estimated heritability using full model

Days to silk (DS) of maize NAM population is highly heritable trait[5]. Estimated total heritability by using full model approach was 79.86% for DS, mostly due to dominance and dominance related epistasis effects (${h_{D + }}^2\hat = 50.52\% $) (Table 1), referring the importance of analyzing dominance-related effects even if in inbred lines. Recent study shows that environmental specific effects are relatively unimportant for leaf orientation traits of maize NAM population, contributing to only 4.98%-7.32% phenotypic variation[7]. Unlike the maize leaf orientation traits, large amount of heritability was estimated due to environmental specific effects (${h_{GE}}^2\hat = 27.31\% $), which refer the genetic effects varied across different environments.

Table 1 Estimated heritability (%) of genetic effects for days to silk using full model and additive model
点击放大
2.2 Genetic architecture of DS

Association analyses for DS identified multiple loci with different genetic effects. Full model approach identified total 50 highly significant (-log10 PEW>5) QTSs (Fig. 1, Table S1 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). The identified QTSs had 64 genetic main effects and 54 environmental specific effects. Therefore, environmental specific effects of QTSs play important roles in DS of NAM population. Despite of the low frequency of heterozygote genotypes of the identified loci (8.21%-9.24% for the loci which had dominant effects, and 3.51%-9.03% for the loci which had dominance related epistasis interaction), we observed large impacts of dominance related effects on DS; though only three QTSs had highly significant dominant effects, there were five pairs of QTSs with highly significant dominance related epistasis interactions (Table S1 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). Flowering time in plants results from interactive molecular pathways[17], and epistasis effects have been observed in Arabidopsis[18] and rice[19]. In this study, the full model identified total 24 pairs of highly significant epistasis effects for DS of NAM population. In converse to self-fertilizing crop species, small effects of many loci were reported to control the flowering time using QTL mapping of maize NAM population[5]. Similar to previous QTL mapping of DS of NAM population, association analysis with the full model estimated small genetic effects of DS QTSs. The largest positive individual effect of QTS (S10_ 113745101) had a dominant effect of only 1.43 days (-log10 PEW = 47.3) that could explain 2.92% phenotypic variation. Again, the largest negative individual effect of QTS (S1_172281879) had an additive × environment 1 (ae1) effect of-0.912 day (-log10 PEW = 51.5) that contributed to 0.85% phenotypic variation, though total additive effect of the QTS in environment 1 (a + ae1) was only-0.559 day. Similar to individual genetic effects of loci, estimated epistasis effects were also small. The largest epistasis effects of QTSs (S4_ 53677782 and S8_37237820) had a dominance × dominance (dd) effect of only 2.688 days (-log10 PEW= 22.3), which could explain 10.31% phenotypic variation. The identified QTS S3_159869611 had the largest positive additive effect (a $\hat = $ 0.486 day, -log10PEW= 61.1), and the QTS S2_109001252 had the largest negative additive effect (a $\hat = $ -0.408 day, -log10 PEW= 43.3).

Circle: QTS with additive effect; Square: QTS with dominant effect; Line between two QTSs: Epistasis effect; Red: QTS with general effects for two environments; Green: QTS with environment-specific effects; Blue: QTS with both general and environment-specific effects; Black: QTS with signifi⁃ cant epistasis effects but without detected individual effects. Fig. 1 G × G plot of detected significant QTSs (PEW < 0.05) for DS by using full model (DS_ADI) and additive model (DS_A) approaches
2.3 Candidate gene annotation

Candidate genes corresponding to DS QTSs were collected from Gramene database (http://ensembl.gramene.org/Zea_mays/). Functions of candidate genes were searched in the UniProt (http://www.uniprot.org/uniprot/) with the accession number of the genes collected from Gramene database. Descriptions of some of the candidate genes were collected from NCBI gene database. Moreover, the functions of candidate genes were collected via literature search in Google. Functions of some candidate genes were tabulated in supplementary Table 2 (Table S2 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). We observed that some of the candidate genes were members of well-known gene families that have crucial functions in plant life. For example, QTS S1_ 172281879 is the near variant of C3HC4-type RING finger family protein gene GRMZM2G116714. The C3HC4-type RING finger genes play important roles in various physiological processes including growth, development, and stress responses[20]. QTS S3_ 54472637 is the variant of MYB transcription factor protein gene GRMZM2G051256. The MYB transcription factor proteins play regulatory roles in development processes and defense responses in plants[21]. Functions of most of the candidate genes are still unknown.

Table 2 Prediction of total genetic effects of days to silk
点击放大
2.4 Prediction of best line, superior line, and superior hybrid for DS

Along with the provided association mapping results, best line (BL), superior line (SL), and superior hybrid (SH) can be predicted for DS that may help breeders for future breeding program (Table 2). Overall total genetic effect of the non-B73 allele homozygous (QQ) combinations was 2.25 days across environments, but variant from 0.20 to 4.18 days in four environments. Predicted total genetic effect for F1 hybrid (1.95 days) was smaller than non-B73 allele homozygous (QQ) genotypes.

Maximum positive total genetic effect across environments was revealed for the line Z012E0020 (6.83 days) called as the positive best line (best line (+)), whereas environment specific positive best lines were Z008E0050 (9.89 days) in environment 1, Z012E0124 (9.72 days) in environment 2, Z007E0043 (6.89 days) in environment 3, and Z012E0058 (9.27 days) in environment 4 (Table S3 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). Maximum negative total genetic effect across environments was revealed for the line Z019E0177 (-5.72 days) called as negative best line (best line (-)), and its total genetic values were varied to (-1.87--8.56) days under four different environments. Environmental specific negative best lines were Z024E0182 (-9.05 days) in environment 1, Z024E0114 (-6.16 days) in environment 2, Z010E0020 (-5.48 days) in environment 3, and Z024E0094 (-8.69 days) in environment 4. Total genetic values of environmental specific best lines were largely varied, (-2.50--9.05) days for line Z024E0182, (-2.57--7.36) days for line Z024E0114, (-2.11--5.48) days for line Z010E0020, and (-1.41--8.69) days for line Z024E0094. Therefore, there was no specific best line across the environments for DS.

The predicted superior negative line (superior line (-)) could provide insight for crop improvement along with the optimum homozygous genotypes (QQ, qq) combinations. Total overall genetic effect of the predicted superior line had-7.11 days, which was smaller than the existing best line (Z019E0177).

Again, the total genetic effect of the negative superior hybrid, that exhausted the optimum combination of homozygous (QQ, qq) and heterozygous (Qq) genotypes had-11.80 days, which was 6.08 days earlier than the existing line Z019E0177, referring that the predicted superior hybrid has greater scope than the predicted superior line for further improvement. We tabulated optimum genotypes corresponding to loci of the predicted lines (Table S4 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459) that could be helpful to breeders for further crop improvement.

2.5 Association mapping with additive model

Additive model identified 47 highly significant QTSs, among which 31 QTSs were also identified by full model (Fig. 1). As like the full model, estimated effects from additive model were small. Estimated total heritability was 31.65% by using additive model approach that was less than half of the total heritability of full model (Table 1), illustrating the problem of missing heritability by using additive model. Therefore, ignoring dominant and epistasis interactions may have large impacts on under-estimating heritability of complex traits.

3 Discussion

Role of heterozygous genotypes has been ignored in GWAS under the assumption that most of the genetic variations in animal and plant organisms are results of additive effects of multiple loci. Environmental impacts were also ignored or adjusted by subtracting their effects from phenotypic data. However, ignorance or adjustments of important factors can result in missing information about the genetic architecture of complex traits. Full model approach was designed to estimate or predict the effects of different types of factors (additive, dominance, epistasis, and their environmental interactions) that can provide more information about the underlying mechanisms of complex traits. In this study, maize days to silk was analyzed by using full model approach, which revealed new insight about this complex traits. DS is related with adaption of maize under various environments, a major criterion for selection breeding[1]. We observed genetic effects of multiple loci varying under different environments. Estimated heritability of environmental specific effects was 27.31%. For full model analyses, dominance and dominance related epistasis interaction had large effects on DS. An additive model was also analyzed in this study. Association study with additive model approach had smaller heritability than the full model approach. Correlation between predicted genotypic values and phenotypes was very high for full model approach (r≈0.96), suggesting the analysis results can accurately predict the phenotypes. Epistasis effects were unimportant for DS in previous QTL mapping study[5]. However, we observed large impact of epistasis effects on DS, contributing to around 49.37% of phenotypic variations (Table 1). This result showed concordance with the results observed in Arabidopsis[18] and rice[19].

By calculating the total genetic effects of lines, we observed that there was no specific line with large genetic effect across environments, rather than found that different lines had large effects under different environments. This result suggests that the maize flowering time is very sensitive to environments, and different environments need different combinations of genotypes for better performance. The predicted genotypes of SL and SH also suggest the same hypotheses that the superior genotypes of loci were different under different environments (Table S4). The predicted SL and SH had larger genetic effects than the best lines, suggesting the scope of further improvement for the maize days to silk with the predicted genotype combinations.

References
[1] LI Y X, LI C, BRADBURY P J, et al. Identification of genetic variants associated with maize flowering time using an extremely large multi-genetic background population. The Plant Journal: For Cell and Molecular Biology, 2016,86 (5):391–402. DOI: 10.1111/tpj.2016.86.issue-5.
[2] XU J, LIU Y, LIU J, et al. The genetic architecture of flowering time and photoperiod sensitivity in maize as revealed by QTL review and Meta analysis. Journal of Integrative Plant Biology, 2012,54 (6):358–373. DOI: 10.1111/jipb.2012.54.issue-6.
[3] GRILLO M A, LI C, HAMMOND M, et al. Genetic architecture of flowering time differentiation between locally adapted populations of Arabidopsis thaliana. The New Phytologist, 2013,197 (4):1321–1331. DOI: 10.1111/nph.12109.
[4] JUNG C, MULLER A E. Flowering time control and applications in plant breeding. Trends in Plant Science, 2009,14 (10):563–573. DOI: 10.1016/j.tplants.2009.07.005.
[5] BUCKLER E S, HOLLAND J B, BRADBURY P J, et al. The genetic architecture of maize flowering time. Science, 2009,325 (5941):714–718. DOI: 10.1126/science.1174276.
[6] YANG J, ZHU J. Methods for predicting superior genotypes under multiple environments based on QTL effects. Theoretical and Applied Genetics, 2005,110 (7):1268–1274. DOI: 10.1007/s00122-005-1963-2.
[7] MONIR M M. Comparing different genetic models and statistical approaches of GWAS for complex traits. Hangzhou: Zhejiang University, 2016: 44-64.
[8] LIYUAN Z. Genetic association studies for complex traits of crops and linear-model-based multiple dimensionality reduction method developing. Hangzhou: Zhejiang University, 2016: 10-23.
[9] ZHANG F T, ZHU Z H, TONG X R, et al. Mixed linear model approaches of association mapping for complex traits based on omics variants. Scientific Reports, 2015,5 :10298. DOI: 10.1038/srep10298.
[10] TIAN F, BRADBURY P J, BROWN P J, et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nature Genetics, 2011,43 (2):159–162. DOI: 10.1038/ng.746.
[11] ZHU Z, TONG X, ZHU Z, et al. Development of GMDR-GPU for gene-gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PloS One, 2013,8 (4):e61943. DOI: 10.1371/journal.pone.0061943.
[12] MONIR M M, ZHU J. Comparing GWAS results of complex traits using full genetic model and additive models for revealing genetic architecture. Scientific Reports, 2017,7 :38600. DOI: 10.1038/srep38600.
[13] SEARLE S R, CASELLA G, MCCULLOCH C E. Variance Components. New York, USA: John Wiley & Sons, 2009.
[14] YANG J, ZHU J, WILLIAMS R W. Mapping the genetic architecture of complex traits in experimental populations. Bioinformatics, 2007,23 (12):1527–1536. DOI: 10.1093/bioinformatics/btm143.
[15] YANG J, HU C C, HU H, et al. QTLNetwork: Mapping and visualizing genetic architecture of complex traits in experimental populations. Bioinformatics, 2008,24 (5):721–723. DOI: 10.1093/bioinformatics/btm494.
[16] QI T, JIANG B, ZHU Z, et al. Mixed linear model approach for mapping quantitative trait loci underlying crop seed traits. Heredity, 2014,113 (3):224–232. DOI: 10.1038/hdy.2014.17.
[17] KOMEDA Y. Genetic regulation of time to flower in Arabidopsis thaliana. Annual Review of Plant Biology, 2004,55 :521–535. DOI: 10.1146/annurev.arplant.55.031903.141644.
[18] EL-LITHY M E, BENTSINK L, HANHART C J, et al. New Arabidopsis recombinant inbred line populations genotyped using SNPWave and their use for mapping flowering-time quantitative trait loci. Genetics, 2006,172 (3):1867–1876.
[19] UWATOKO N, ONISHI A, IKEDA Y, et al. Epistasis among the three major flowering time genes in rice: Coordinate changes of photoperiod sensitivity, basic vegetative growth and optimum photoperiod. Euphytica, 2007,163 (2):167–175.
[20] MA K, XIAO J H, LI X H, et al. Sequence and expression analysis of the C3HC4-type RING finger gene family in rice. Gene, 2009,444 (1/2):33–45.
[21] CHEN Y H, YANG X Y, HE K, et al. The MYB transcription factor superfamily of Arabidopsis: Expression analysis and phylogenetic comparison with the rice MYB family. Plant Molecular Biology, 2006,60 (1):107–124. DOI: 10.1007/s11103-005-2910-y.