Association analysis revealed importance of dominance effects on days to silk of maize nested association mapping (NAM) population | [PDF全文] |
Flowering time is an important trait, measuring the adaption capability of plants to local environments[1-2]. The transition from vegetative growth to flowering by integrating different environmental prompts is crucial for plant reproductive success[3]. Flowering time is considered as a major selection criterion in plant breeding[4]. Maize is originated from Balsas teosinte (Zea mays ssp. parviglumis) in the Mexican highlands (approximately 9 000 years ago), and has evolved to adapt in diverse ecological conditions[1]. Dissection of the genetic mechanisms of maize flowering time is crucial for evolutionary analysis and future breeding programs. Several studies have been conducted to discover the underlying genetic architecture of flowering time of maize by using quantitative trait locus (QTL) mapping and genome-wide association study (GWAS)[1-2, 5].
Dominance and epistasis are important phenomena in quantitative genetics area. Complexity of the genetic architecture can be largely attributed to epistasis, which plays a significant role in heterosis, inbreeding depression, adaptation, reproductive isolation, and speciation[6]. However, most of the GWAS of different organisms have been analyzed by ignoring the impacts of dominance, epistasis and environmental interaction. Ignoring the important factors could be a major cause of missing heritability of GWAS. Heterozygous genotypes are generally found with high proportion in random mating and others specially designed populations. However, in whole genome sequencing data with a large number of single nucleotide polymorphisms (SNPs), a small portion of heterozygote genotypes can be found in inbred lines of animals and crops that could have large impacts on phenotypic traits[7-8]. In this study, an attempt was made to discover the impacts of heterozygous genotypes on days to silk (DS) of maize nested association mapping (NAM) population. For that, the full model approach with additive, dominance, epistasis, and their environmental interactions was analyzed to dissect genetic architecture of DS by using QTXNetwork[9]. Maize NAM population was constructed by only five-generation self-crossing within 25 diverse families[1, 5, 10]. However, there were no heterozygous genotypes rather than a small portion of missing genotypes. The missing genotypes were replaced by heterozygote genotypes in this study. An additive model with only additive (a) and additive by environmental interactions (ae) was also analyzed for comparison study. Genotypes and total genetic effects of best line (BL), superior line (SL), and superior hybrid (SH) were arranged to observe the scope of improvements for future maize breeding.
1 Materials and methods 1.1 Genotype and phenotype dataMaize nested association mapping (NAM) population derived in the United States (US-NAM) was used in this study, which was derived by crossing 25 diverse lines with B73 and then self-pollination for five generations[5, 10]. Days to silk (DS) were scored over nine environments. However, to get rid from computational complexity, data from four environments were analyzed. We downloaded the genotype and phenotype data sets from http://www.panzea.org/.
1.2 Statistical analysisNewly developed approach for association mapping, implemented in QTXNetwork, was used for association mapping. The approach has two distinct parts: generalized multi-factor dimensionality reduction (GMDR) method to scan SNPs by 1D for main effects, 2D and 3D for epistasis interactions using module GMDR-GPU[11] of QTXNetwork, and then association mapping was conducted on detected SNPs by using quantitative traits SNPs (QTS) module of QTXNetwork. Two different models for association mapping were used in this study, called full genetic model and multiloci additive model. The full genetic model includes SNP loci effects (a, d, aa, ad, da, dd) as fixed; environment (e) and loci by environment interaction (ae, de, aae, ade, dae, dde) as random effects for four environments (1 for E1, 2 for E2, 3 for E4, and 4 for E9). The statistical approaches of full and additive models[12] were used for conducting association analyses.
Henderson method Ⅲ [13] was used to calculate the F-statistic test for association analysis. A total of 2 000 times permutation was conducted for calculating the critical F-value to control the experiment-wise type Ⅰ error (αEW < 0.05). Parameters were estimated by using the MCMC (Markov chain Monte Carlo) algorithm with 20 000 Gibbs sample iterations[9, 14-16]. Experiment-wise critical P value (PEW-value) was calculated by controlling experiment-wise typeⅠerror (PEW < 0.05).
2 Results 2.1 Estimated heritability using full modelDays to silk (DS) of maize NAM population is
highly heritable trait[5]. Estimated total heritability by
using full model approach was 79.86% for DS, mostly
due to dominance and dominance related epistasis
effects (
点击放大 |
Association analyses for DS identified multiple
loci with different genetic effects. Full model approach
identified total 50 highly significant (-log10 PEW>5)
QTSs (Fig. 1, Table S1 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). The identified QTSs had 64 genetic main
effects and 54 environmental specific effects.
Therefore, environmental specific effects of QTSs play
important roles in DS of NAM population. Despite of
the low frequency of heterozygote genotypes of the
identified loci (8.21%-9.24% for the loci which had
dominant effects, and 3.51%-9.03% for the loci which
had dominance related epistasis interaction), we
observed large impacts of dominance related effects on
DS; though only three QTSs had highly significant
dominant effects, there were five pairs of QTSs
with highly significant dominance related
epistasis interactions (Table S1 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). Flowering time in plants results from
interactive molecular pathways[17], and epistasis effects
have been observed in Arabidopsis[18] and rice[19]. In this
study, the full model identified total 24 pairs of highly
significant epistasis effects for DS of NAM population.
In converse to self-fertilizing crop species, small effects
of many loci were reported to control the flowering time
using QTL mapping of maize NAM population[5].
Similar to previous QTL mapping of DS of NAM
population, association analysis with the full model
estimated small genetic effects of DS QTSs. The
largest positive individual effect of QTS (S10_
113745101) had a dominant effect of only 1.43 days
(-log10 PEW = 47.3) that could explain 2.92% phenotypic
variation. Again, the largest negative individual effect
of QTS (S1_172281879) had an additive ×
environment 1 (ae1) effect of-0.912 day (-log10 PEW =
51.5) that contributed to 0.85% phenotypic variation, though total additive effect of the QTS in environment
1 (a + ae1) was only-0.559 day. Similar to individual genetic effects of loci, estimated epistasis effects were
also small. The largest epistasis effects of QTSs (S4_
53677782 and S8_37237820) had a dominance ×
dominance (dd) effect of only 2.688 days (-log10 PEW=
22.3), which could explain 10.31% phenotypic variation. The identified QTS S3_159869611 had the
largest positive additive effect (a
Candidate genes corresponding to DS QTSs were collected from Gramene database (http://ensembl.gramene.org/Zea_mays/). Functions of candidate genes were searched in the UniProt (http://www.uniprot.org/uniprot/) with the accession number of the genes collected from Gramene database. Descriptions of some of the candidate genes were collected from NCBI gene database. Moreover, the functions of candidate genes were collected via literature search in Google. Functions of some candidate genes were tabulated in supplementary Table 2 (Table S2 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). We observed that some of the candidate genes were members of well-known gene families that have crucial functions in plant life. For example, QTS S1_ 172281879 is the near variant of C3HC4-type RING finger family protein gene GRMZM2G116714. The C3HC4-type RING finger genes play important roles in various physiological processes including growth, development, and stress responses[20]. QTS S3_ 54472637 is the variant of MYB transcription factor protein gene GRMZM2G051256. The MYB transcription factor proteins play regulatory roles in development processes and defense responses in plants[21]. Functions of most of the candidate genes are still unknown.
点击放大 |
Along with the provided association mapping results, best line (BL), superior line (SL), and superior hybrid (SH) can be predicted for DS that may help breeders for future breeding program (Table 2). Overall total genetic effect of the non-B73 allele homozygous (QQ) combinations was 2.25 days across environments, but variant from 0.20 to 4.18 days in four environments. Predicted total genetic effect for F1 hybrid (1.95 days) was smaller than non-B73 allele homozygous (QQ) genotypes.
Maximum positive total genetic effect across environments was revealed for the line Z012E0020 (6.83 days) called as the positive best line (best line (+)), whereas environment specific positive best lines were Z008E0050 (9.89 days) in environment 1, Z012E0124 (9.72 days) in environment 2, Z007E0043 (6.89 days) in environment 3, and Z012E0058 (9.27 days) in environment 4 (Table S3 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459). Maximum negative total genetic effect across environments was revealed for the line Z019E0177 (-5.72 days) called as negative best line (best line (-)), and its total genetic values were varied to (-1.87--8.56) days under four different environments. Environmental specific negative best lines were Z024E0182 (-9.05 days) in environment 1, Z024E0114 (-6.16 days) in environment 2, Z010E0020 (-5.48 days) in environment 3, and Z024E0094 (-8.69 days) in environment 4. Total genetic values of environmental specific best lines were largely varied, (-2.50--9.05) days for line Z024E0182, (-2.57--7.36) days for line Z024E0114, (-2.11--5.48) days for line Z010E0020, and (-1.41--8.69) days for line Z024E0094. Therefore, there was no specific best line across the environments for DS.
The predicted superior negative line (superior line (-)) could provide insight for crop improvement along with the optimum homozygous genotypes (QQ, qq) combinations. Total overall genetic effect of the predicted superior line had-7.11 days, which was smaller than the existing best line (Z019E0177).
Again, the total genetic effect of the negative superior hybrid, that exhausted the optimum combination of homozygous (QQ, qq) and heterozygous (Qq) genotypes had-11.80 days, which was 6.08 days earlier than the existing line Z019E0177, referring that the predicted superior hybrid has greater scope than the predicted superior line for further improvement. We tabulated optimum genotypes corresponding to loci of the predicted lines (Table S4 available at http://www.zjujournals.com/agr/EN/article/showSupportInfo.do?id=10459) that could be helpful to breeders for further crop improvement.
2.5 Association mapping with additive modelAdditive model identified 47 highly significant QTSs, among which 31 QTSs were also identified by full model (Fig. 1). As like the full model, estimated effects from additive model were small. Estimated total heritability was 31.65% by using additive model approach that was less than half of the total heritability of full model (Table 1), illustrating the problem of missing heritability by using additive model. Therefore, ignoring dominant and epistasis interactions may have large impacts on under-estimating heritability of complex traits.
3 DiscussionRole of heterozygous genotypes has been ignored in GWAS under the assumption that most of the genetic variations in animal and plant organisms are results of additive effects of multiple loci. Environmental impacts were also ignored or adjusted by subtracting their effects from phenotypic data. However, ignorance or adjustments of important factors can result in missing information about the genetic architecture of complex traits. Full model approach was designed to estimate or predict the effects of different types of factors (additive, dominance, epistasis, and their environmental interactions) that can provide more information about the underlying mechanisms of complex traits. In this study, maize days to silk was analyzed by using full model approach, which revealed new insight about this complex traits. DS is related with adaption of maize under various environments, a major criterion for selection breeding[1]. We observed genetic effects of multiple loci varying under different environments. Estimated heritability of environmental specific effects was 27.31%. For full model analyses, dominance and dominance related epistasis interaction had large effects on DS. An additive model was also analyzed in this study. Association study with additive model approach had smaller heritability than the full model approach. Correlation between predicted genotypic values and phenotypes was very high for full model approach (r≈0.96), suggesting the analysis results can accurately predict the phenotypes. Epistasis effects were unimportant for DS in previous QTL mapping study[5]. However, we observed large impact of epistasis effects on DS, contributing to around 49.37% of phenotypic variations (Table 1). This result showed concordance with the results observed in Arabidopsis[18] and rice[19].
By calculating the total genetic effects of lines, we observed that there was no specific line with large genetic effect across environments, rather than found that different lines had large effects under different environments. This result suggests that the maize flowering time is very sensitive to environments, and different environments need different combinations of genotypes for better performance. The predicted genotypes of SL and SH also suggest the same hypotheses that the superior genotypes of loci were different under different environments (Table S4). The predicted SL and SH had larger genetic effects than the best lines, suggesting the scope of further improvement for the maize days to silk with the predicted genotype combinations.
[1] |
LI Y X, LI C, BRADBURY P J, et al. Identification of genetic variants associated with maize flowering time using an extremely large multi-genetic background population.
The Plant Journal: For Cell and Molecular Biology, 2016,86 (5):391–402. DOI: 10.1111/tpj.2016.86.issue-5. |
[2] |
XU J, LIU Y, LIU J, et al. The genetic architecture of flowering time and photoperiod sensitivity in maize as revealed by QTL review and Meta analysis.
Journal of Integrative Plant Biology, 2012,54 (6):358–373. DOI: 10.1111/jipb.2012.54.issue-6. |
[3] |
GRILLO M A, LI C, HAMMOND M, et al. Genetic architecture of flowering time differentiation between locally adapted populations of Arabidopsis thaliana.
The New Phytologist, 2013,197 (4):1321–1331. DOI: 10.1111/nph.12109. |
[4] |
JUNG C, MULLER A E. Flowering time control and applications in plant breeding.
Trends in Plant Science, 2009,14 (10):563–573. DOI: 10.1016/j.tplants.2009.07.005. |
[5] |
BUCKLER E S, HOLLAND J B, BRADBURY P J, et al. The genetic architecture of maize flowering time.
Science, 2009,325 (5941):714–718. DOI: 10.1126/science.1174276. |
[6] |
YANG J, ZHU J. Methods for predicting superior genotypes under multiple environments based on QTL effects.
Theoretical and Applied Genetics, 2005,110 (7):1268–1274. DOI: 10.1007/s00122-005-1963-2. |
[7] | MONIR M M. Comparing different genetic models and statistical approaches of GWAS for complex traits. Hangzhou: Zhejiang University, 2016: 44-64. |
[8] | LIYUAN Z. Genetic association studies for complex traits of crops and linear-model-based multiple dimensionality reduction method developing. Hangzhou: Zhejiang University, 2016: 10-23. |
[9] |
ZHANG F T, ZHU Z H, TONG X R, et al. Mixed linear model approaches of association mapping for complex traits based on omics variants.
Scientific Reports, 2015,5 :10298. DOI: 10.1038/srep10298. |
[10] |
TIAN F, BRADBURY P J, BROWN P J, et al. Genome-wide association study of leaf architecture in the maize nested association mapping population.
Nature Genetics, 2011,43 (2):159–162. DOI: 10.1038/ng.746. |
[11] |
ZHU Z, TONG X, ZHU Z, et al. Development of GMDR-GPU for gene-gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes.
PloS One, 2013,8 (4):e61943. DOI: 10.1371/journal.pone.0061943. |
[12] |
MONIR M M, ZHU J. Comparing GWAS results of complex traits using full genetic model and additive models for revealing genetic architecture.
Scientific Reports, 2017,7 :38600. DOI: 10.1038/srep38600. |
[13] | SEARLE S R, CASELLA G, MCCULLOCH C E. Variance Components. New York, USA: John Wiley & Sons, 2009. |
[14] |
YANG J, ZHU J, WILLIAMS R W. Mapping the genetic architecture of complex traits in experimental populations.
Bioinformatics, 2007,23 (12):1527–1536. DOI: 10.1093/bioinformatics/btm143. |
[15] |
YANG J, HU C C, HU H, et al. QTLNetwork: Mapping and visualizing genetic architecture of complex traits in experimental populations.
Bioinformatics, 2008,24 (5):721–723. DOI: 10.1093/bioinformatics/btm494. |
[16] |
QI T, JIANG B, ZHU Z, et al. Mixed linear model approach for mapping quantitative trait loci underlying crop seed traits.
Heredity, 2014,113 (3):224–232. DOI: 10.1038/hdy.2014.17. |
[17] |
KOMEDA Y. Genetic regulation of time to flower in Arabidopsis thaliana.
Annual Review of Plant Biology, 2004,55 :521–535. DOI: 10.1146/annurev.arplant.55.031903.141644. |
[18] |
EL-LITHY M E, BENTSINK L, HANHART C J, et al. New Arabidopsis recombinant inbred line populations genotyped using SNPWave and their use for mapping flowering-time quantitative trait loci.
Genetics, 2006,172 (3):1867–1876. |
[19] |
UWATOKO N, ONISHI A, IKEDA Y, et al. Epistasis among the three major flowering time genes in rice: Coordinate changes of photoperiod sensitivity, basic vegetative growth and optimum photoperiod.
Euphytica, 2007,163 (2):167–175. |
[20] |
MA K, XIAO J H, LI X H, et al. Sequence and expression analysis of the C3HC4-type RING finger gene family in rice.
Gene, 2009,444 (1/2):33–45. |
[21] |
CHEN Y H, YANG X Y, HE K, et al. The MYB transcription factor superfamily of Arabidopsis: Expression analysis and phylogenetic comparison with the rice MYB family.
Plant Molecular Biology, 2006,60 (1):107–124. DOI: 10.1007/s11103-005-2910-y. |