中国医科大学学报  2023, Vol. 52 Issue (10): 910-916

文章信息

邓颖, 熊安秀, 刘景珍, 祁闪闪, 熊昊
DENG Ying, XIONG Anxiu, LIU Jingzhen, QI Shanshan, XIONG Hao
基于TARGET数据库筛选儿童急性髓细胞白血病的关键基因和信号通路
Screening of key genes and pathways involved in childhood acute myeloid leukemia based on TARGET database
中国医科大学学报, 2023, 52(10): 910-916
Journal of China Medical University, 2023, 52(10): 910-916

文章历史

收稿日期:2022-11-17
网络出版时间:2023-10-16 18:39:29
基于TARGET数据库筛选儿童急性髓细胞白血病的关键基因和信号通路
邓颖1 , 熊安秀2 , 刘景珍3 , 祁闪闪4 , 熊昊5     
1. 华中科技大学同济医学院附属武汉儿童医院公共卫生科, 武汉 430015;
2. 宜昌市中心人民医院儿科, 湖北 宜昌 443003;
3. 恩施州中心医院儿童血液消化心血管肾病中心, 湖北 恩施 445099;
4. 华中科技大学同济医学院附属武汉儿童医院儿童血液疾病研究室, 武汉 430015;
5. 华中科技大学同济医学院附属武汉儿童医院血液肿瘤科, 武汉 430015
摘要目的 应用生物信息学方法对TARGET数据库进行分析,筛选儿童急性髓细胞白血病(AML)相关差异基因及信号通路。方法 从TARGET数据库下载AML患儿的临床信息和基因表达信息。采用R软件筛选低/中高危患儿的差异基因、初诊/复发患儿的差异基因。应用DAVID对差异基因进行富集分析。采用STRING和Cytoscape软件筛选枢纽(hub)基因。采用Cox回归分析hub基因与总生存期的关系。结果 通过分析TARGET数据库AML患儿的基因表达数据集,获得96个与危险度分层和复发相关的差异基因,其中上调基因38个,下调基因58个。蛋白质-蛋白质相互作用(PPI)网络分析鉴定出15个hub基因。上调的7个hub基因与危险度分层呈正相关(均P<0.05),下调的8个hub与危险度分层呈负相关(均P<0.05)。Cox回归分析结果显示,15个hub基因影响儿童AML患儿的总生存期,其中DNMT3BDPP4CENPEH3C10是影响儿童AML患儿总生存期的独立危险因素。结论 PPI网络分析鉴定出的15个hub基因可能是儿童AML潜在的分子标志物,为深入研究儿童AML的发病机制和治疗靶点提供方向。
Screening of key genes and pathways involved in childhood acute myeloid leukemia based on TARGET database
1. Department of Public Health, Wuhan Children's Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430015, China;
2. Department of Pediatrics, Yichang Central People's Hospital, Yichang 443003, China;
3. Department of Pediatric Hematology, Gastroenterology, Vasculocardiology and Nephrology, The Center Hospital of EnShi TuJia and Miao Autonomous Prefecture, Enshi 445099, China;
4. Institute of Pediatric Hematology, Wuhan Children's Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430015, China;
5. Department of Hematology and Oncology, Wuhan Children's Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430015, China
Abstract: Objective The TARGET database was utilized for bioinformatics analysis to screen key genes and pathways involved in childhood acute myeloid leukemia (AML). Methods Clinical and gene expression data of children with AML were obtained from the TARGET database. R software was used to analyze the different expression genes (DEGs) between low and standard-high risk, as well as the newly diagnosed and recurrent children. Enrichment analysis of DEGs was performed using the DAVID online database. Hub genes were identified using STRING and Cytoscape software. The relationship between hub genes and overall survival (OS) was analyzed using the Cox regression analysis. Results A total of 96 DEGs were identified, including 38 up-regulated and 58 down-regulated genes PPI network analysis identified 15 hub genes, of which 7 up-regulated hub genes were positively correlated with the risk group (all P < 0.05), whereas 8 down-regulated hub genes were negatively correlated with the risk group. The Cox regression analysis revealed these 15 hub genes affected the OS of children with AML, among which DNMT3B, DPP4, CENPE, and H3C10 were independent risk factors for OS. Conclusion The 15 hub genes have the potential to serve as molecular markers that may provide valuable insights into the pathogenesis and therapeutic targets of childhood AML.

急性髓细胞白血病(acute myeloid leukemia,AML) 约占儿童白血病的20%~25%[1]。虽然与急性淋巴细胞白血病相比,儿童AML的发病率低,预后较差。目前,AML的总生存率不到70%,复发率高达25%~35%[2-3]。细胞遗传学被认为是AML风险分层的主要依据,然而在临床实践中,接近半数的AML患儿细胞遗传学正常,疾病的转归却有着显著的差异[4]。近年来,随着二代基因测序技术的发展,AML相关的重现性遗传学异常逐渐被发现,并且在AML诊断、治疗和预后等方面的重要性日益凸显,但仍有部分患儿未携带已知的遗传学异常。因此,探究与儿童AML相关的新的分子生物标志物有助于对AML患儿进行风险分层。本研究通过下载和整理有效治疗方法适用性研究(therapeutically applicable research to generate effective treatments,TARGET) 数据库中儿童AML的基因表达数据和临床信息,利用生物信息学分析手段对AML相关的致病基因进行挖掘,以期为探索AML的发病机制及分子标志物的筛选提供新的方向。

1 材料与方法 1.1 基因表达数据信息

通过TARGET网站(https://ocg.cancer.gov/programs/target) 检索并下载儿童AML的临床信息和基因表达数据。TARGET数据库包含121例AML患儿的临床信息,其中,女性患儿63例,男性患儿58例。TARGET数据库中AML患儿的基因表达数据和临床信息来自美国儿童肿瘤协作组(children’s oncology group,COG) 的美国AML (America acute myeloid leukemia,AAML) 0531 Ⅲ期临床试验。

1.2 差异表达基因的筛选

采用R软件的DESeq2包对TARGET数据库AML患儿的基因表达数据进行差异表达基因筛选,筛选条件为差异表达上调或下调≥2倍,即| log2FC |≥1,且P < 0.05。

1.3 差异表达基因的生物信息学分析

采用DAVID在线数据库对筛选出的差异基因进行基因本体论(gene ontology,GO) 注释和京都基因与基因组数据库(kyoto encyclopedia of genes and genomes,KEGG) 信号通路注释,分析差异基因参与的生物学过程(biological process,BP) 以及涉及的相关通路,以P < 0.05为入选标准。应用STRING在线数据库构建差异基因的蛋白质-蛋白质相互作用(protein-protein interaction,PPI)网络结构图,然后使用Cytoscape 3.7.2软件进行可视化,并通过cytoHubba插件筛选hub基因。

1.4 枢纽(hub)基因的验证

采用SPSS 22.0软件进行统计分析。使用R语言的survival包计算hub基因表达量的最佳cut-off值,并将表达量 < cut-off值定义为低表达,表达量≥cut-off值定义为高表达。采用χ2检验分析hub基因表达量与AML患儿临床病例特征的Spearman相关系数。2组计量资料的比较采用Mann Whitney检验。采用单因素或多因素Cox回归分析计算hub基因的风险比(hazard ratios,HR) 和95%置信区间(confidence interval,CI)。P < 0.05为差异有统计学意义。

2 结果 2.1 差异基因的筛选

TARGET数据库中有121例AML患儿的临床信息,除7例危险度分层未知外,其余114例患儿中48例低危,61例中危,5例高危。进一步对114例患儿初诊时骨髓标本的基因表达信息进行分析。相较于低危患儿,中高危患儿有2 092个差异基因,其中上调基因1 167个,下调基因925个(图 1A)。相较初诊患儿,39例复发患儿有785个差异基因,其中上调基因184个,下调基因601个(图 1B)。绘制2组差异基因的韦恩图,共得到差异基因96个,其中上调基因38个(图 1C),下调基因58个(图 1D)。

A, volcanic map illustrating DEGs in low and standard-high risk groups; B, volcanic map illustrating DEGs in newly diagnosed and recurrent groups; C, venn diagram illustrating up-regulated DEGs in low/medium-high risk groups and newly diagnosed/recurrent groups; D, venn diagram illustrating down-regulated DEGs in low/medium-high risk groups and newly diagnosed/recurrent groups. 图 1 TARGET数据库AML患儿差异基因的筛选 Fig.1 Screening of DEGs of childhood AML using the TARGET database

2.2 差异基因的GO富集分析和KEGG富集分析

采用DAVID数据库对96个差异基因进行GO富集分析。结果显示,差异基因在细胞组分(cellular component,CC) 主要富集于核小体、细胞质、晚期内体膜、核染色体、浓缩染色体外着丝粒,在BP中主要富集于核小体组装、染色体分离、对有毒物质的反应、染色质沉默、纺锤组织,在分子功能(molecular function,MF) 主要富集于蛋白质异二聚活性、DNA结合、染色质结合、微管结合、MAP激酶酪氨酸/丝氨酸/苏氨酸磷酸酶活性,见图 2A。96个差异基因的KEGG通路富集分析结果显示,差异基因在酗酒、系统性红斑狼疮、病毒致癌等通路聚集,见图 2B

A, GO enrichment analysis of DEGs; B, KEGG enrichment analysis of DEGs. 图 2 差异基因的富集分析 Fig.2 Enrichment analysis of DEGs

2.3 分析与AML相关的hub基因

通过STRING数据库构建96个差异基因的PPI网络(图 3A)。除去孤立无关系的蛋白节点,通过Cytoscape软件对差异基因进行PPI网络的可视化(图 3B),颜色越红,关联性越强。在Cytoscape软件的cytoHubba模块,分别使用Betweenness、EPC、MCC、Radiality、Stress等5种计算方法计算PPI网络节点的前10个有较高连接度的hub基因,见表 1。得到的15个hub基因分别是细胞分裂周期相关蛋白2 (cell division cycle associated 2,CDCA2)、细胞周期蛋白依赖激酶1 (cyclin dependent kinase 1,CDK1)、着丝粒蛋白E (centromere protein E,CENPE)、DNA甲基转移酶3B (DNA methyltransferase 3 beta,DNMT3B)、二肽基肽酶4 (dipeptidyl peptidase 4,DPP4)、核酸外切酶1 (exonuclease 1,EXO1)、TTK蛋白激酶(TTK protein kinase,TTK)、FOS原癌基因(Fos proto-oncogene,FOS)、H2B聚集组蛋白5 (H2B clustered histone 5,H2BC5)、H3聚集组蛋白4 (H3 clustered histone 4,H3C4)、H3聚集组蛋白10 (H3 clustered histone 10,H3C10)、H2A聚集组蛋白19 (H2A clustered histone 19,H2AC19)、H2A聚集组蛋白20 (H2A clustered histone 20,H2AC20)、H2B聚集组蛋白21 (H2B clustered histone 21,H2BC21)、转化生长因子β1诱导转录1 (transforming growth factor beta 1 induced transcript 1,TGFB1I1)。

A, PPI network analysis of DEGs; B, visualization results of PPI network analysis of the associated genes. 图 3 差异基因的PPI分析 Fig.3 PPI analysis of DGEs

表 1 5种计算方法的前10个hub基因 Tab.1 The top 10 hub genes identified by five centrality methods
No. EPC Stress MCC Radiality Betweenness
1 CDK1 FOS CDK1 CDK1 FOS
2 H2AC20 CDK1 H2BC21 FOS CDK1
3 H2BC21 H2BC21 H2AC20 H2BC21 EGR1
4 HIST2H2AA H2AC20 EXO1 H2AC20 H2BC21
5 EXO1 H2BC5 HIST2H2AA H2BC5 H2AC20
6 H2BC5 EXO1 H2BC5 EXO1 H2BC5
7 CENPE EGR1 CENPE HIST2H2AA EXO1
8 CDCA2 HIST2H2AA H3C4 CENPE TGFB1I1
9 H3C10 CENPE H3C10 DNMT3B DPP4
10 H3C4 DPP4 TTK TTK HIST2H2AA

2.4 hub基因与AML患儿临床病理特征的相关性

采用χ2检验分析AML患儿临床病理特征(包括性别、年龄、初诊时外周血白细胞、中枢浸润、危险度分层) 与15个hub基因表达量之间的相关性。结果表明hub基因的表达与男女比例、年龄分布、是否中枢侵犯等无相关性(均P > 0.05)。DNMT3BDPP4CENPETTKCDCA2EXO1CDK1的高表达与危险度分层呈正相关(均P < 0.05),H2BC21H2AC19H3C10FOSH3C4TGFB1I1H2BC5H2AC20的高表达与危险度分层呈负相关(均P < 0.05)。DPP4CENPETTKCDCA2基因高表达组患儿初诊时外周血WBC高于低表达组(均P < 0.05),H2BC21H2AC19H3C10FOSH3C4H2BC5H2AC20基因高表达组患儿初诊时外周血白细胞低于低表达组(均P < 0.05),DNMT3BEXO1CDK1TGFB1I1基因高表达组和低表达组患儿初诊时白细胞计数无统计学差异(均P > 0.05)。

2.5 hub基因与AML患儿预后的关系。

对15个hub基因进行单因素Cox回归分析,结果显示,DNMT3BDPP4CENPETTKCDCA2EXO1CDK1等基因的高表达和H2BC21H2AC19H3C10FOSH3C4TGFB1I1H2BC5H2AC20等基因的低表达是影响AML患儿总生存期的危险因素。对以上因素进行多因素Cox比例风险模型分析,结果显示,15个相关联的hub基因中DNMT3B的高表达、DPP4的高表达、CENPE的高表达、H3C10的低表达是AML患儿总生存期的独立危险因素,见表 2

表 2 hub基因的单因素和多因素分析构建预后风险模型 Tab.2 Univariate and multivariate Cox regression analyses of the hub genes for constructing prognostic risk models
Gene Univariate analysis Multivariate analysis
HR (95% CI) P HR (95% CI) P
DNMT3B 10.00 (1.37-100.00) 0.023 7.60 (1.04-55.44) 0.046
DPP4 2.86 (1.45-5.88) 0.003 2.41 (1.05-5.49) 0.037
CENPE 2.63 (1.47-4.76) 0.001 2.41 (1.34-4.35) 0.003
TTK 2.44 (1.09-5.26) 0.030 0.41 (0.12-1.37) 0.146
CDCA2 2.33 (1.28-4.17) 0.005 1.19 (0.47-3.03) 0.716
EXO1 2.27 (1.19-4.35) 0.013 1.22 (0.52-2.88) 0.648
CDK1 2.08 (1.11-3.85) 0.022 1.03 (0.36-2.91) 0.960
H2BC21 0.56 (0.32-0.96) 0.035 1.25 (0.52-3.00) 0.613
H2AC19 0.55 (0.31-0.97) 0.041 0.78 (0.30-2.00) 0.601
H3C10 0.54 (0.32-0.93) 0.028 0.50 (0.28-0.87) 0.013
FOS 0.54 (0.29-0.98) 0.043 0.66 (0.34-1.32) 0.240
H3C4 0.51 (0.28-0.93) 0.027 1.15 (0.44-3.04) 0.775
TGFB1I1 0.46 (0.26-0.80) 0.006 0.69 (0.37-1.30) 0.203
H2BC5 0.42 (0.22-0.79) 0.008 1.12 (0.39-3.20) 0.834
H2AC20 0.33 (0.15-0.73) 0.006 0.62 (0.22-1.75) 0.364

3 讨论

本研究通过分析TARGRT数据库AML患儿的基因表达数据,筛选出与危险度分层和复发相关的96个差异基因。GO和KEGG富集分析结果显示,差异基因编码的蛋白主要富集于细胞核和细胞质,参与的BP主要有DNA结合、核小体组装、染色体分离等。

在筛选出的hub基因中,DNMT3B与DNA甲基化的相关。DNMT3B负责DNA的从头甲基化。虽然在AML中DNMT3B的突变很少见,但DNMT3B的高表达预示着高耐药率和高复发率[5-6]。髓过氧化物酶(myeloperoxidase,MPO) 是诊断AML的生物标志物,其高表达与更好的预后相关。据报道,DNMT3B可以上调AML细胞MPO启动子的甲基化,抑制MPO的表达。而且DNMT3B对MPO启动子的甲基化不受AML常见突变(FLT3-ITDCEBPANPM1突变) 的影响[7]。此外,DNMT3B高表达导致DNA超甲基化在T细胞急性淋巴细胞白血病和伯基特淋巴瘤中也有报道[8]

DPP4表达于骨髓来源的细胞、骨骼肌细胞、血管平滑肌细胞和脂肪细胞等[9-11]。在慢性白血病,尤其是慢性B淋巴细胞白血病中,有大量研究[12-14]证明了DPP4的促癌作用。DPP4的表达影响临床分期、治疗缓解所需时间、总生存期、无病生存期,是负性的预后因素。虽然急性白血病样本中,包括T细胞急性淋巴细胞白血病、B细胞急性淋巴细胞白血病和AML,白血病细胞膜DPP4的表达量与非白血病患者无差异,但白血病患者血浆sCD26/DPP4明显高于非白血病患者[15]

CENPECDCA2DK1TTKEXO1FOS与细胞周期、细胞增殖密切相关。在AML中,CDK1的促癌作用相对明确,但关于CENPECDCA2TTKEXO1FOS的报道较少。H2BC5H2AC19H2AC20H2BC21H3C4H3C10是组蛋白H2和H3的成员,是构成核小体的重要组成部分。目前,关于TGFB1I1H2BC5H2AC19H2AC20H2BC21H3C4H3C10在AML致病机制中的作用少有报道。

综上所述,本研究通过对TARGET数据库AML患儿初治时低危组与中高危组的骨髓差异基因、初治时与复发时骨髓差异基因的综合分析,发现CDCA2CDK1CENPEDNMT3BDPP4EXO1TTKFOSH2BC5H3C4H3C10H2AC19H2AC20H2BC21TGFB1I115个与儿童AML相关的hub基因。这15个基因均与预后相关,尤其是影响预后的独立危险因素的DNMT3BDPP4CENPEH3C10基因可能成为儿童AML的分子机制研究以及预后判断的新靶点。

参考文献
[1]
GAMIS AS, ALONZO TA, PERENTESIS JP, et al. Children's oncology group's 2013 blueprint for research: acute myeloid leukemia[J]. Pediatr Blood Cancer, 2013, 60(6): 964-971. DOI:10.1002/pbc.24432
[2]
ZWAAN CM, KOLB EA, REINHARDT D, et al. Collaborative efforts driving progress in pediatric acute myeloid leukemia[J]. J Clin Oncol, 2015, 33(27): 2949-2962. DOI:10.1200/jco.2015.62.8289
[3]
APLENC R, MESHINCHI S, SUNG L, et al. Bortezomib with standard chemotherapy for children with acute myeloid leukemia does not improve treatment outcomes: a report from the Children's Oncology Group[J]. Haematologica, 2020, 105(7): 1879-1886. DOI:10.3324/haematol.2019.220962
[4]
GRIMWADE D, HILLS RK, MOORMAN AV, et al. Refinement of cytogenetic classification in acute myeloid leukemia: determination of prognostic significance of rare recurring chromosomal abnormalities among 5876 younger adult patients treated in the United Kingdom Medical Research Council trials[J]. Blood, 2010, 116(3): 354-365. DOI:10.1182/blood-2009-11-254441
[5]
HAYETTE S, THOMAS X, JALLADES L, et al. High DNA methyltransferase DNMT3B levels: a poor prognostic marker in acute myeloid leukemia[J]. PLoS One, 2012, 7(12): e51527. DOI:10.1371/journal.pone.0051527
[6]
LAMBA JK, CAO XY, RAIMONDI SC, et al. Integrated epigenetic and genetic analysis identifies markers of prognostic significance in pediatric acute myeloid leukemia[J]. Oncotarget, 2018, 9(42): 26711-26723. DOI:10.18632/oncotarget.25475
[7]
ITONAGA H, IMANISHI D, WONG YF, et al. Expression of myeloperoxidase in acute myeloid leukemia blasts mirrors the distinct DNA methylation pattern involving the downregulation of DNA methyltransferase DNMT3B[J]. Leukemia, 2014, 28(7): 1459-1466. DOI:10.1038/leu.2014.15
[8]
POOLE CJ, ZHENG WL, LODH A, et al. DNMT3B overexpression contributes to aberrant DNA methylation and MYC-driven tumor maintenance in T-ALL and Burkitt's lymphoma[J]. Oncotarget, 2017, 8(44): 76898-76920. DOI:10.18632/oncotarget.20176
[9]
CASROUGE A, SAUER AV, BARREIRA DA SILVA R, et al. Lymphocytes are a major source of circulating soluble dipeptidyl peptidase 4[J]. Clin Exp Immunol, 2018, 194(2): 166-179. DOI:10.1111/cei.13163
[10]
LAMERS D, FAMULLA S, WRONKOWITZ N, et al. Dipeptidyl peptidase 4 is a novel adipokine potentially linking obesity to the metabolic syndrome[J]. Diabetes, 2011, 60(7): 1917-1925. DOI:10.2337/db10-1707
[11]
RASCHKE S, ECKARDT K, BJØRKLUND HOLVEN K, et al. Identification and validation of novel contraction-regulated myokines released from primary human skeletal muscle cells[J]. PLoS One, 2013, 8(4): e62008. DOI:10.1371/journal.pone.0062008
[12]
CRO L, MORABITO F, ZUCAL N, et al. CD26 expression in mature B-cell neoplasia: its possible role as a new prognostic marker in B-CLL[J]. Hematol Oncol, 2009, 27(3): 140-147. DOI:10.1002/hon.888
[13]
MATUSZAK M, LEWANDOWSKI K, CZYŻ A, et al. The prognostic significance of surface dipeptidylpeptidase Ⅳ (CD26) expression in B-cell chronic lymphocytic leukemia[J]. Leuk Res, 2016, 47: 166-171. DOI:10.1016/j.leukres.2016.06.002
[14]
IBRAHEM L, ELDERINY WE, ELHELW L, et al. CD49d and CD26 are independent prognostic markers for disease progression in patients with chronic lymphocytic leukemia[J]. Blood Cells Mol Dis, 2015, 55(2): 154-160. DOI:10.1016/j.bcmd.2015.05.010
[15]
DE ANDRADE CFCG, BIGNI R, POMBO-DE-OLIVEIRA MS, et al. CD26/DPPIV cell membrane expression and DPPIV activity in plasma of patients with acute leukemia[J]. J Enzyme Inhib Med Chem, 2009, 24(3): 708-714. DOI:10.1080/14756360802334800