Genome-wide association study and conditional analysis reveal the importance of non-additive effects and ethnicity interaction for coronary heart disease | [PDF全文] |
Cardiovascular disease (CVD) is the leading cause of mortality and disability in America. Coronary heart disease (CHD), also known as coronary artery disease (CAD), atherosclerotic heart disease, or ischemic heart disease (IHD), is a major type of CVD. According to a report from the American Heart Association (AHA), over 2 150 Americans died of CVD every day, and CHD alone caused approximate 1 of 6 total deaths in the United States in 2010[1]. Although familial aggregation lends evidence to genetic background for CHD with heritability estimation ranging from 30% to 60%[2], genetic risk factors do not cause the disease alone. Environmental risk factors, such as cigarette smoking[3]and physical inactivity[4] are two major risks for CHD as well. The AHA report demonstrates that Americans with CVD are much more likely to be current or former smokers than Americans without CVD. Americans with intermediate or poor levels of physical activities are more inclined to develop CVD. Thus, genetic predisposition together with environmental determines the development of CVD through an intricately interactive network.
Recent genome-wide association studies (GWAS) have identified multiple genetic loci associated with CHD or related traits. Previous studies have reported over 40 genes/regions associated with CHD risk at the genomewide significance (PEW < 5×10-8)[5]. However, genetic risk variants reported to date account for only a small fraction of heritability. In the recent report by the Coronary Artery Disease Genome-wide Replication and Meta-analysis plus the Coronary Artery Disease Genetics (CARDIoGRAMplusC4D) consortium, it was estimated that 15 newly discovered and 31 previously identified loci together with a further set of 104 likely independent single nucleotide polymorphism (SNP) explained only 10.6% of the genetic variance of CAD, suggesting important genetic loci remain to be discovered[6]. The ignorance of dominance, epistasis and the interaction between genetic and environmental factors in most studies may impair the ability to detect multiple genuine signals.
Herein, we report a two-step GWAS study of CHD using software QTXNetwork based on data from the MultiEthnic Study of Atherosclerosis (MESA) study. Association of genotype SNPs and target trait Framingham risk score (JAMA version) was analyzed by mixed linear model (MLM) setting ethnicity as environment and six life styles including smoking status as individual cofactor. The aim of study is to identify novel CHD associated loci and explore the complicated network of genetic and environmental factors across different ethnics.
1 Materials and methods 1.1 ParticipantsThe MESA data were obtained from dbGaP (database of Genotypes and Phenotypes, http://www.ncbi.nlm.nih.gov/gap). MESA is a prospective populationbased study focusing on characterization of subclinical CVD and the risk factors that enable prediction of the progression of CVD. Study participants of four ethnic groups include 6 500 men and women, nearly in equal numbers, who are aged 45-84 years and free of clinical CVD at baseline, and initially recruited in 2000 from six US communities: Baltimore, MD; Chicago, IL, Forsyth County, NC; Los Angeles County, CA, Northern Manhattan, NY; and St. Paul, MN. 38% of the recruited participants are European-American (E-A), 28% AfricanAmerican (A-A), 22% Hispanic American (H-A), and 12% Asian, predominantly of Chinese descent, American (C-A). MESA’ s enrollment and exclusion criteria are described previously. All participants provided written informed consent as approved by all participating Institutional Review Boards. Details of the study design and cohort characteristics have been described elsewhere.
1.2 Genotyping and quality control (QC)Non-duplicate, unrelated participants were selected for analysis. DNA was isolated from blood samples that were collected from participants at the time of enrollment using Puregene DNA kit (Puregen, Gentra Systems, Minneapolis, MN, USA). Whole genome genotyping was conducted in 2009 using Affymetrix Human SNP array 6.0. SNP QC filters for analysis inclusion were based on the following criteria: (1) SNP call rate > 95%; (2) subject call rate > 95%; (3) polymorphic at least in one ethnic group (i.e., no monomorphic SNP) with no filtering of any SNP based on minor allele frequency due to allele frequency differences among the MESA ethnic groups; (4) heterozygosity < 53% as the uniform heterozygosity distribution was restricted to the range of 0%-53% with removal of < 0.01% of SNPs having heterozygosity of > 53%. The original genotypes of 866 435 SNPs from 22 autosome that met QC criteria were used for the analysis.
1.3 Phenotype and covariate measurementsThe dependent variable in this study is Framingham risk score calculated from the JAMA Framingham risk survival model (Frjama), which is used to predict risk of developing hard CHD within 10 years. Frjama was developed using multiple variables, including age, gender, hypertension stage, total cholesterol, high density lipoprotein (HDL) cholesterol, fasting glucose, diabetes mellitus and current smoking status. Six covariates were included in our study: (1) moderate walking (min/wk MSu): walking to get places to the bus, car, work, into the store; (2) light leisure read (MET-min/wk M-Su): read, knit, sew, visit, do nothing, non-work recreational computer; (3) light transportation (min/wk M-Su): drive or ride in car, ride the bus/subway, including travel to work; (4) total intentional exercise (MET-min/wk); (5) light leisure TV (min/wk M-Su): sit or recline and watch TV; and (6) pack-years of cigarette smoking. Six lifestyles were set as individual cofactor for conducting conditional mapping, respectively. The conditional models are (1) Frjama|Walk, (2) Frjama|Read, (3) Frjama|Trans, (4) Frjama|Exer, (5) Frjama|TV, and (6) Frjama|Smoke.
1.4 Statistical analysisThe genetic model for the phenotypic value of the k-th individual in the h-th ethnic population (yhk) can be expressed by the following MLM,
$ {y_{hk}} = \mu + {s_k} + \sum\limits_i {{a_i}{x_{{A_{ik}}}}} + \sum\limits_i {{d_i}} {x_{{D_{ik}}}} + \sum\limits_{i < j} {a{a_{ij}}} {x_{A{A_{ijk}}}} +\sum\limits_{i < j} {a{d_{ij}}{x_{A{D_{ijk}}}}} + \sum\limits_{i < j} {a{d_{ij}}} {x_{A{D_{ijk}}}} + \\\sum\limits_{i < j} {d{a_{ij}}} {x_{D{A_{ijk}}}} + \sum\limits_{i < j} {d{d_{ij}}} {x_{D{D_{ijk}}}} + {e_h} + \sum\limits_i {a{e_{ih}}} {u_{A{E_{ihk}}}} + \sum\limits_i {d{e_{ih}}} {u_{D{E_{ihk}}}} + \sum\limits_{i < j} {aa{e_{ijh}}} {u_{AA{E_{ijhk}}}} + \\\sum\limits_{i < j} {ad{e_{ijh}}} {u_{AD{E_{ijhk}}}} + \sum\limits_{i < j} {da{e_{ijh}}} {u_{DA{E_{ijhk}}}} + \sum\limits_{i < j} {dd{e_{ijh}}} {x_{DD{E_{ijhk}}}} + {\varepsilon _{hk}}. $ |
where μ is the population mean; sk is the fixed effect of the k-th individual (0 for female, 1 for male); ai is the additive effect of the i-th locus with coefficient xAik (1 for QQ, 0 for Qq, -1 for qq); di is the dominance effect of the i-th locus with coefficient xDik (1 for Qq, 0 for QQ and qq); aaij, adij, daij and ddij are the digenic epistasis effects with coefficients xAAijk (1 for QQ×QQ and qq×qq, -1 for QQ×qq and qq×QQ, and 0 for others), xADijk (1 for QQ×Qq, -1 for qq×Qq, and 0 for others), xDAijk (1 for Qq×QQ, -1 for Q ×qq, and 0 for others) and xDDijk (1 for Qq×Qq, and 0 for others), respectively; eh is the effect of the h-th ethnic population (1 for E-A, 2 for C-A, 3 for A-A, 4 for H-A); aeih is the additive × race interaction effect of the i-th locus in the h-th ethnic population with coefficient uAEihk; deih is the dominance × race interaction effect of the i-th locus in the h-th ethnic population with coefficient uDEihk; aaeihk, adeihk, daeihk and ddeihk are the digenic epistasis×race interaction effects in the h-th ethnic population with coefficient uAAEihk, uADEihk, uDAEihk and uDDEihk, respectively; and εhk is the residual effect of the k-th individual in the h-th ethnic population. In this model, we have constraints for random variables with normal distributions of zero mean and variances δv2.
To reduce the computational burden in mixed model-based GWAS analysis, a two-step strategy was employed to dissect genetic architecture. First we used GMDR modular (generalized multifactor dimensionality reduction) in QTXNetwork software to scan 866 435 SNP markers of two years records of 5 336 subjects for 1D-3D significant candidate SNP markers, and obtained 304 candidate SNPs. Quantitative trait SNP (QTS) mapping modular in QTXNetwork (http://ibi.zju.edu.cn/software/QTXNetwork/) was then used to dissect the genetic architecture of base model and 6 conditional models. Significant SNPs associated with phenotypic variants are analyzed by setting a total of 2 000 permutation tests to calculate the critical P-value for controlling the experiment-wise type 1 error. The QTS effects were predicted by using the Markov Chain Monte Carlo method with 20 000 Gibbs sampler iterations. The correlation coefficient (
We used full genetic model including additive, dominance, epistasis and ethnicity specific effects for our GWAS study on CHD[7]. A total of 61 QTSs and 24 pairs of epistasis were detected significantly associated with Framingham risk score. One QTS resided within coding sequence, causing missense mutation, 30 QTSs located within intron region of genes, and the other QTSs located near genes. The estimated heritability explained by identified QTSs and epistasis under seven models are listed in Table 1. The total heritability varied across different models, with base model exhibiting a heritability of 64.68%, model Frjama|Walk exhibiting the lowest value (
点击放大 |
When we compare the results between the base model and conditioned models, we can find that the genetic architecture of CHD changed greatly after conditioned on cofactors of TV, smoke, transportation and exercise, but remained little changed after conditioned on walk and read. It was indicated that TV, smoke, transportation and exercise could have large impacts on CHD, but the effect of walk and read might be limited. We can further separate the effects into two groups: the first group contains QTS effects detected in base model and certain condition models (Table 2); the second group contains QTS effects only in condition models, which indicated that these effects were suppressed by life styles (Table 3).
点击放大 |
点击放大 |
Only four single effects and two pairs of epistasis effects were detected remain unchanged across the base model and six condition models. There were some main genetic effects remaining unaltered by life styles, namely additive effect of rs317258 (318 kb 5' of GCFC2), dominance effect of rs12621362 (20 kb 5' of C2orf51), dominance effects of rs6996584 (SAMD12), additive effects of rs10965365 (63 kb 3' of DMRTA1) and rs1930368 (166 kb 3' of RP11-165H23.1), and additive× additive effects of rs317258 (318 kb 5' of GCFC2)×rs6996584 (SAMD12). Their robustness across different models and races indicates that these loci have fundamental roles not affected by six life styles in CHD genetic architecture.
Main dominance effects of rs6974603 (WBSCR17) and E-A, A-A, H-A specific dominance effects of rs17116652 (LPAR3) could also be detected in all models. But its corresponding effects fluctuated across different models. It was indicated that although they are not totally affected by cofactors, they still susceptible to certain cofactors to some extent. For instance, after removing the effects of transportation, TV, and smoke, the main additive effects of rs6974603 (WBSCR17) decreased, indicating that people with homozygotes of major alleles G/G for this locus can benefit from less frequent driving, TV watching or smoking. WBSCR17 was also confirmed to associate with type 2 diabetes in African Americans by GENNID study[8].
2.3 Genes detected in the base model but not in certain condition modelsThe A-A specific dominance effect (de3=0.010 9) of locus rs6996584 (SAMD12) was lost in all condition models, indicating that the effect was caused by all cofactors, and via giving up all corresponding habits could help to reduce the CHD risk for A-A individuals with heterozygote C/G in this locus. The gene SAMD12 have been detected before in GWAS of carotid artery intimamedia thickness[9], which shares a similar mechanism with CHD progression.
There were also effects that were lost only in some condition models, like dominance of rs2455801 located 6.9 kb away from 3' of gene ANKRD28 which regulates focal adhesion and cell migration by ANKRD28-DOCK180 interaction[10]. It remained unchanged after removing the effect of transportation, walk, and smoke, but disappeared after conditioned on TV, read, and exercise. It was suggested that rs2455801 could response to TV, read, and exercise but not the other three cofactors in terms of CHD.
2.4 Effects detected in certain condition models but not in the base modelThere were also effects not detected in the base model, but appeared in the condition models. Only a minority of effects were detected after condition on walk, read, exercise (1 locus with single effects, and 1 pair of epistasis for |Walk, 6 loci with single effects and 2 pairs of epistasis for |Read, 8 loci with single effects and 2 pairs of epistasis for |Exer), but more effects appeared after taking |Trans, |TV and |Smoke into consideration, suggesting a larger suppressive effects these three activities held on the expression of genetic components for CHD.
Genetic effects response differently to corresponding cofactors. For example C-A specific dominance of rs8048681 (WWOX) was detected only in 3 conditional models (|Smoke, |Trans, and |Exer), but not in the other three models. In another genesmoking interaction GWAS, researchers also detected WWOX responsible for coronary artery calcification in smokers, but not in non-smokers, supporting our results that expression of WWOX was susceptible to cigarette smoking[11].
2.5 Ethnicity-specific effectsIn our study, we found some effects were quite stable across different ethnic populations, while others exhibited strong ethnic predisposition. For example, SNPs in 9p21.3 region were repeatedly associated with CHD in different populations, such as European ancestry population[2, 12-15], South Asian population[15] and East Asia population[16-18], since it was identified in first GWAS for CHD in 2005[19]. In our study, rs10965365 (63 kb 3' of DMRTA1) also tagged this region with main additive effect (a=0.005 4) irreverent to races in concordance with previous findings. It could also be detected under all models indicating that it may pay a fundamental role in CHD progression.
On the contrary, the SNP rs17116652 located near LPAR3 belongs to the ethnicity-specific group with dominance effects varied among four ethnic populations. LPAR3 encodes a subtype of lysophosphatidic acid (LPA) receptors. Pharmacological studies have identified LPAR3 as the primary mediator of LPA-induced platelet activation during thermogenesis[20-21]. In our study, we found that despite of the universal main dominance effect (d = 0.034 7 in base model), rs17116652 also exhibited ethnicity-specific effects in response to distinct cofactors. Ethnic-specific dominance of E-A, A-A, and H-A populations could be detected in base model (de1 = -0.029 1, de3 = -0.028 6, and de4 = -0.033 1), and their effects decreased simultaneously after removing effects of TV and smoke. But for E-A population, further decrease could be observed while eliminating the effects of read. It was suggested that carriers of heterozygote (C/ T) of rs17116652 in the three mentioned populations can reduce CHD risk via giving up watching TV or through cigarette cessation. However, further decrease could be obtained via stopping leisure reading for European American, because there was an additional reduction detected after removing the effects of reading.
2.6 Gene network of detected genes conditional on life stylesTo summarize the biological pathways that are primarily depicted by our research, we examined whether the genes harboring identified loci enrolled in particular disease, pathways or molecular networks using Biopubinfo (Fig. 2). Genes were classified into 2 categories based on their reactions to conditional analysis, unaltered, or lost after condition. We found that additive genetic effects of 2 genes (DMRTA1 and TLE4) remain unaltered after condition on different life styles. Disorder of cardiovascular system and diabetes mellitus type 2 were detected to associate with both gene sets affected by life styles. Particularly, acquired immunodeficiency syndrome and bipolar affective psychoses were associated with gene sets suppressed or caused by lifestyles respectively. Except diseases relevant to cardiovascular system, we also detected many other diseases associated with gene sets suppressed by life styles, indicating a pleiotropic role played by those genes.
We performed a genome-wide association study for MESA cohorts on the 10-year hard CHD risk for individuals, with full model including genetic effects of additive, dominance, epistasis and ethnicity-specific effects to unveil the complex architecture of CHD. We also utilized conditioned models with six lifestyles (walk, read, transportation, exercise, TV, and smoke) as cofactors to study the influence of human lifestyles on CHD. In contrast to previous findings that dominance contributed little to the missing heritability, we found that dominance and ethnicity-specific dominance contributed almost half of the total phenotypic variance (42.34%-51.56%) for CHD. We also found that the genetic background for CHD varied greatly across four ethnic populations, with ethnicity-specific heritability ranging from 31.18% to 43.99%.
Missing heritability has always been a haunting problem in genomic association study. To combat this deficiency, we introduced epistasis effects into our full model, including four types of effects, additive × additive, additive × dominance, dominance × additive and dominance × dominance, along with their ethnic interactions. We also observed the heritability of epistasis contributed a large portion (11.72%-20.27%) of total heritability. Our finding offers a successful example exploring the missing heritability accounted by gene interaction (G×G) as supposed by ZUK et al.[22].
One of the most desirable aspirations of GWAS is to provide patients with personalized risk prediction. In our method, genomic effects of individual loci and epistasis SNP pairs were predicted, based on which we can predict optical genotype combination of superior line (all loci are homozygotes) and superior hybrid (loci can be either homozygote or heterozygote) for each population (Tables 4 and 5). We found that setting rs17116652 as C/T, rs6706330 as A/G, rs423711 as G/A, and rs11250700 as G/T could simultaneously increase or decrease CHD risk in all ethnic groups. However, some loci were only efficacious to some specific populations. For example, the heterozygote T/C of rs8048681 was only detrimental to African Americans due to de3 = 0.0121. The heterozygotes A/G of rs16876162 and C/G of 8_rs6996584_could exclusively decrease CHD risk in European Americans by 0.005 3 and 0.007 9 compared with homozygote A/A and G/G, respectively. Our method may offer a road map for the disease risk prediction.
点击放大 |
点击放大 |
[1] |
GO A S, MOZAFFARIAN D, ROGER V L, et al. Heart disease and stroke statistics—2014 update.
Circulation, 2014,129 :e28. DOI: 10.1161/01.cir.0000441139.02102.80. |
[2] |
SCHUNKERT H, KÖNIG I R, KATHIRESAN S, et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease.
Nature Genetics, 2011,43 (4):333–338. DOI: 10.1038/ng.784. |
[3] |
VERHEUGT F. Passive smoking and the risk of coronary heart disease.
Nederlands Tijdschrift Voor Geneeskunde, 2004,148 :645–647. |
[4] |
CARNETHON M R. Physical activity and cardiovascular disease: how much is enough?.
American Journal of Lifestyle Medicine, 2009,3 :44–49. DOI: 10.1177/1559827609332737. |
[5] |
HINDORFF L A, JUNKINS H A, HALL P, et al. A catalog of published genome-wide association studies.
National Human Genome Research Institute, 2011 . |
[6] |
DELOUKAS P, KANONI S, WILLENBORG C, et al. Large-scale association analysis identifies new risk loci for coronary artery disease.
Nature Genetics, 2013,45 :25–33. |
[7] |
D'AGOSTINO R B, VASAN R S, PENCINA M J, et al. General cardiovascular risk profile for use in primary care the Framingham heart study.
Circulation, 2008,117 :743–753. DOI: 10.1161/CIRCULATIONAHA.107.699579. |
[8] |
HASSTEDT S J, HIGHLAND H M, ELBEIN S C, et al. Five linkage regions each harbor multiple type 2 diabetes genes in the African American subset of the GENNID study.
Journal of Human Genetics, 2013,58 :378–383. DOI: 10.1038/jhg.2013.21. |
[9] |
DONG C, DELLA-MORTE D, BEECHAM A, et al. Genetic variants in LEKR1 and GALNT10 modulate sex-difference in carotid intimamedia thickness: a genome-wide interaction study.
Atherosclerosis, 2015,240 :462–467. DOI: 10.1016/j.atherosclerosis.2015.04.019. |
[10] |
KIYOKAWA E, MATSUDA M. Regulation of focal adhesion and cell migration by ANKRD28-DOCK180 interaction.
Cell Adhesion & Migration, 2009,3 :281–284. |
[11] |
POLFUS L M, SMITH J A, SHIMMIN L C, et al. Genome-wide association study of gene by smoking interactions in coronary artery calcification.
PLoS ONE, 2013,8 :e74642. DOI: 10.1371/journal.pone.0074642. |
[12] |
BURTON P R, CLAYTON D G, CARDON L R, et al. Genome-wide association study of 14 000 cases of seven common diseases and 3 000 shared controls.
Nature, 2007,447 :661–678. DOI: 10.1038/nature05911. |
[13] |
SAMANI N J, ERDMANN J, HALL A S, et al. Genomewide association analysis of coronary artery disease.
New England Journal of Medicine, 2007,357 :443–453. DOI: 10.1056/NEJMoa072366. |
[14] | WILD P S, ZELLER T, SCHILLERT A, et al. A genome-wide association study identifies LIPA as a susceptibility gene for coronary artery disease. Circulation: Cardiovascular Genetics , 2011 . |
[15] |
Coronary Artery Disease (CAD) Genetics Consortium. A genomewide association study in Europeans and South Asians identifies five new loci for coronary artery disease.
Nature Genetics, 2011,43 :339–344. DOI: 10.1038/ng.782. |
[16] |
TAKEUCHI F, YOKOTA M, YAMAMOTO K, et al. Genome-wide association study of coronary artery disease in the Japanese.
European Journal of Human Genetics, 2012,20 :333–340. DOI: 10.1038/ejhg.2011.184. |
[17] |
LU X F, WANG L Y, CHEN S F, et al. Genome-wide association study in Han Chinese identifies four new susceptibility loci for coronary artery disease.
Nature Genetics, 2012,44 :890–894. DOI: 10.1038/ng.2337. |
[18] |
LEE J Y, LEE B S, SHIN D J, et al. A genome-wide association study of a coronary artery disease risk variant.
Journal of Human Genetics, 2013,58 :120–126. DOI: 10.1038/jhg.2012.124. |
[19] |
MCPHERSON R, PERTSEMLIDIS A, KAVASLAR N, et al. A common allele on chromosome 9 associated with coronary heart disease.
Science, 2007,316 :1488–1491. DOI: 10.1126/science.1142447. |
[20] |
GARDELL S E, DUBIN A E, CHUN J. Emerging medicinal roles for lysophospholipid signaling.
Trends in Molecular Medicine, 2006,12 :65–75. DOI: 10.1016/j.molmed.2005.12.001. |
[21] |
ROTHER E, BRANDL R, BAKER D L, et al. Subtype-selective antagonists of lysophosphatidic acid receptors inhibit platelet activation triggered by the lipid core of atherosclerotic plaques.
Circulation, 2003,108 :741–747. DOI: 10.1161/01.CIR.0000083715.37658.C4. |
[22] |
ZUK O, HECHTER E, SUNYAEV S R, et al. The mystery of missing heritability: genetic interactions create phantom heritability.
Proceedings of the National Academy of Sciences, 2012,109 :1193–1198. DOI: 10.1073/pnas.1119675109. |