基于多层感知机的绵羊限性性状基因组选择模拟研究

引用本文

王万年, 陈思佳, 郜金荣, 温中豪, 袁梦娇, 张洪志, 庞志旭, 乔利英, 刘文忠. 基于多层感知机的绵羊限性性状基因组选择模拟研究[J]. 畜牧兽医学报, 2023, 54(7): 2824-2835.

WANG Wannian, CHEN Sijia, GAO Jinrong, WEN Zhonghao, YUAN Mengjiao, ZHANG Hongzhi, PANG Zhixu, QIAO Liying, LIU Wenzhong. Simulation Study on Genomic Selection of Sex-limited Traits Using Multilayer Perceptron in Sheep[J]. Acta Veterinaria et Zootechnica Sinica, 2023, 54(7): 2824-2835.

王万年, 陈思佳, 郜金荣, 温中豪, 袁梦娇, 张洪志, 庞志旭, 乔利英, 刘文忠

山西农业大学动物科学学院，太谷 030801

收稿日期：2022-12-28

基金项目：“雁云白羊”种业创新良种联合攻关(2022xczx09)；山西农业大学生物育种工程项目(YTCG126)

作者简介：王万年(1999-)，男，山西晋中人，硕士生，主要从事动物数量遗传学研究，E-mail: wannian1876@163.com.

通信作者：刘文忠，主要从事动物遗传资源的分子评价与种质创新研究，E-mail: tglwzyc@163.com.

摘要：旨在将多层感知机(multilayer perceptron, MLP)应用于绵羊限性性状基因组选择中，并在多种情况下与其他经典基因组选择方法进行比较分析。本研究利用Qmsim软件模拟2个绵羊群体Pop1和Pop2的表型数据和基因型数据。在MLP中使用人工神经网络(artificial neural network, ANN)，线性模型中使用约束性最大似然法(residual maximum likelihood, REML)估计不同群体的遗传参数。利用Python语言自编MLP模型，利用DMU软件实现最佳线性无偏预测(best linear unbiased prediction, BLUP)、基因组最佳线性无偏预测(genomic BLUP)和一步法(single-step GBLUP, SSGBLUP)模型，评估不同情况下各方法遗传力(heritability, h²)和育种值估计方面的差异。各情况下，MLP和SSGBLUP均显著(P < 0.05)优于GBLUP和BLUP。在3种情况下MLP的h²估值与SSGBLUP差异不显著：h²为0.05, 标记数为10K且QTL数为100时的Pop2群体；h²为0.2，QTL数为500的两个标记数下Pop1群体和QTL数为100且标记数为50K时Pop2群体；h²为0.5且QTL数为100时，标记数10K下Pop1群体和标记数50K下Pop2群体；除上述情况之外，MLP的h²估计结果均显著(P < 0.05)优于SSGBLUP、GBLUP和BLUP。在不同h²初值下，QTL数和标记数变化时，Pop1和Pop2群体中MLP的h²估值与当代群体h²的差值小于SSGBLUP、GBLUP和BLUP；SSGBLUP和GBLUP法在不同标记数下遗传参数估计结果差别较大，MLP差别较小。在各情况下，MLP基因组估计育种值(genomic estimated breeding value, GEBV)的准确性均为最高。h²初值为0.05时，MLP在标记数为10K时GEBV准确性略高于SSGBLUP在标记数为50K时的预测准确性。在h²、QTL数和标记数相同的情况下，Pop2群体中各方法的EBV预测准确性较Pop1群体均有提升。根据上述模拟结果表明，在绵羊限性性状基因组选择中，MLP优于其他经典基因组选择方法。

关键词：多层感知机基因组选择模拟预测限性性状

Simulation Study on Genomic Selection of Sex-limited Traits Using Multilayer Perceptron in Sheep

WANG Wannian, CHEN Sijia, GAO Jinrong, WEN Zhonghao, YUAN Mengjiao, ZHANG Hongzhi, PANG Zhixu, QIAO Liying, LIU Wenzhong

College of Animal Science, Shanxi Agricultural University, Taigu 030801, China

Corresponding author: LIU Wenzhong, E-mail: tglwzyc@163.com.

群体
Population

方法
Method

标记数及QTL数The number of markers and QTLs

10K

50K

100

500

100

500

Pop1

当代群体h²

0.050±0.016^Aa

0.051±0.011^Aa

0.050±0.015^Aa

0.049±0.014^Aa

0.051±0.012^Aa

0.050±0.013^Aa

MLP

0.054±0.007^Aa

0.056±0.008^Aa

0.052±0.005^Aa

0.046±0.004^Aa

0.057±0.004^Aa

0.044±0.004^Aa

SSGBLUP

0.057±0.011^Bb

0.064±0.014^Bb

0.054±0.011^Ab

0.043±0.017^Bb

0.068±0.012^Bb

0.041±0.012^Bb

GBLUP

0.062±0.018^Cc

0.068±0.015^Cc

0.053±0.017^Ab

0.054±0.019^Cc

0.069±0.015^Bb

0.058±0.015^Cc

BLUP

0.042±0.024^Dd

0.074±0.024^Dd

0.058±0.021^Bc

0.028±0.024^Dd

0.075±0.020^Cc

0.035±0.028^Dd

Pop2

当代群体h²

0.047±0.013^Aa

0.050±0.009^Aa

0.049±0.011^Aa

0.050±0.016^Aa

0.048±0.018^Aa

0.050±0.018^Aa

MLP

0.049±0.008^Aa

0.051±0.005^Aa

0.046±0.009^Aa

0.053±0.003^Aa

0.049±0.005^Aa

0.055±0.007^Aa

SSGBLUP

0.052±0.011^Ab

0.052±0.011^Aa

0.042±0.011^Bb

0.057±0.012^Bb

0.051±0.014^Bb

0.058±0.012^Bb

GBLUP

0.051±0.016^Ab

0.054±0.014^Bb

0.042±0.017^Bb

0.058±0.017^Bb

0.051±0.017^Bb

0.057±0.016^Bb

BLUP

0.055±0.023^Bc

0.052±0.025^Bc

0.040±0.025^Bc

0.063±0.021^Cc

0.056±0.024^Cc

0.065±0.027^Cc

不同大写字母表示差异极显著(P＜0.01)，不同小写字母表示差异显著(P＜0.05)。下同
Different capital and lowercase letters mean significant differences at P＜0.01 and P＜0.05 levels, respectively. The same as below

群体
Population

方法
Method

标记数及QTL数The number of markers and QTLs

10K

50K

100

500

100

500

Pop1

当代群体h²

0.196±0.016^Aa

0.199±0.016^Aa

0.197±0.015^Aa

0.200±0.018^Aa

0.207±0.019^Aa

0.202±0.013^Aa

MLP

0.175±0.009^Ab

0.198±0.007^Aa

0.196±0.011^Aa

0.196±0.010^Aa

0.216±0.009^Aa

0.204±0.010^Aa

SSGBLUP

0.160±0.017^Bc

0.194±0.013^Ab

0.195±0.020^Aa

0.218±0.018^Bb

0.229±0.017^Bb

0.206±0.017^Aa

GBLUP

0.170±0.019^Cd

0.207±0.015^Bc

0.192±0.018^Ab

0.221±0.014^Bb

0.195±0.018^Cc

0.199±0.020^Bb

BLUP

0.152±0.024^De

0.184±0.026^Cd

0.242±0.024^Bc

0.229±0.024^Cc

0.180±0.027^Dd

0.210±0.031^Cc

Pop2

当代群体h²

0.197±0.015^Aa

0.195±0.021^Aa

0.207±0.017^Aa

0.197±0.021^Aa

0.195±0.027^Aa

0.194±0.024^Aa

MLP

0.185±0.011^Aa

0.197±0.012^Aa

0.203±0.009^Aa

0.194±0.010^Aa

0.194±0.008^Aa

0.197±0.015^Aa

SSGBLUP

0.170±0.016^Bb

0.181±0.019^Bb

0.197±0.021^Bb

0.206±0.019^Bb

0.193±0.018^Aa

0.199±0.021^Bb

GBLUP

0.180±0.021^Cc

0.188±0.022^Cc

0.191±0.028^Cc

0.213±0.019^Cc

0.201±0.023^Bb

0.188±0.021^Cc

BLUP

0.162±0.027^Dd

0.168±0.035^Dd

0.218±0.032^Dd

0.215±0.030^Cd

0.178±0.034^Cc

0.204±0.029^Dd

群体
Population

方法
Method

标记数及QTL数The number of markers and QTLs

10K

50K

100

500

100

500

Pop1

当代群体h²

0.493±0.031^Aa

0.492±0.021^Aa

0.492±0.019^Aa

0.512±0.028^Aa

0.506±0.031^Aa

0.494±0.021^Aa

MLP

0.480±0.014^Ab

0.479±0.006^Ab

0.496±0.010^Aa

0.519±0.010^Aa

0.508±0.012^Aa

0.489±0.011^Aa

SSGBLUP

0.474±0.025^Bc

0.479±0.013^Ab

0.453±0.024^Bb

0.531±0.022^Bb

0.503±0.021^Ab

0.487±0.018^Ab

GBLUP

0.473±0.025^Bc

0.458±0.014^Bc

0.448±0.027^Bb

0.532±0.024^Bb

0.510±0.029^Bc

0.478±0.024^Bc

BLUP

0.530±0.029^Cd

0.418±0.033^Cd

0.530±0.035^Cc

0.490±0.033^Cc

0.520±0.042^Cd

0.546±0.049^Cd

Pop2

当代群体h²

0.491±0.031^Aa

0.504±0.024^Aa

0.507±0.021^Aa

0.504±0.042^Aa

0.511±0.036^Aa

0.486±0.038^Aa

MLP

0.482±0.014^Aa

0.485±0.015^Ab

0.500±0.014^Aa

0.503±0.013^Aa

0.511±0.015^Aa

0.488±0.016^Aa

SSGBLUP

0.477±0.023^Bb

0.481±0.021^Ac

0.499±0.021^Bb

0.508±0.026^Bb

0.513±0.027^Aa

0.490±0.021^Ab

GBLUP

0.479±0.021^Bb

0.438±0.028^Bd

0.498±0.031^Bb

0.507±0.031^Bb

0.503±0.032^Bb

0.482±0.029^Bc

BLUP

0.440±0.033^Cc

0.412±0.039^Ce

0.522±0.046^Cc

0.528±0.044^Cc

0.499±0.045^Bb

0.440±0.059^Cd

[1]	MEUWISSEN T H E, HAYES B J, GODDARD M E. Prediction of total genetic value using genome-wide dense marker maps[J]. Genetics, 2001, 157(4): 1819-1829. DOI:10.1093/genetics/157.4.1819
[2]	MEUWISSEN T, HAYES B, GODDARD M. Genomic selection: a paradigm shift in animal breeding[J]. Anim Front, 2016, 6(1): 6-14. DOI:10.2527/af.2016-0002
[3]	XU Y B, LIU X G, FU J J, et al. Enhancing genetic gain through genomic selection: from livestock to plants[J]. Plant Commun, 2020, 1(1): 100005. DOI:10.1016/j.xplc.2019.100005
[4]	HENDERSON C R. Best linear unbiased estimation and prediction under a selection model[J]. Biometrics, 1975, 31(2): 423-447. DOI:10.2307/2529430
[5]	VANRADEN P M. Efficient methods to compute genomic predictions[J]. J Dairy Sci, 2008, 91(11): 4414-4423. DOI:10.3168/jds.2007-0980
[6]	LEGARRA A, CHRISTENSEN O F, AGUILAR I, et al. Single Step, a general approach for genomic selection[J]. Livest Sci, 2014, 166: 54-65. DOI:10.1016/j.livsci.2014.04.029
[7]	MONTESINOS-LÓPEZ O A, MONTESINOS-LÓPEZ A, PÉREZ-RODRÍGUEZ P, et al. A review of deep learning applications for genomic selection[J]. BMC Genomics, 2021, 22(1): 19. DOI:10.1186/s12864-020-07319-x
[8]	LI B, ZHANG N X, WANG Y G, et al. Genomic prediction of breeding values using a subset of SNPs identified by three machine learning methods[J]. Front Genet, 2018, 9: 237. DOI:10.3389/fgene.2018.00237
[9]	COWLING W A, STEFANOVA K T, BEECK C P, et al. Using the animal model to accelerate response to selection in a self-pollinating crop[J]. G3 (Bethesda), 2015, 5(7): 1419-1428. DOI:10.1534/g3.115.018838
[10]	MEHER P K, RUSTGI S, KUMAR A. Performance of Bayesian and BLUP alphabets for genomic prediction: analysis, comparison and results[J]. Heredity (Edinb), 2022, 128(6): 519-530. DOI:10.1038/s41437-022-00539-9
[11]	CALUS M P L, DE HAAS Y, PSZCZOLA M, et al. Predicted accuracy of and response to genomic selection for new traits in dairy cattle[J]. Animal, 2013, 7(2): 183-191. DOI:10.1017/S1751731112001450
[12]	GAO N, TENG J Y, PAN R Y, et al. Accuracy of whole genome prediction with single-step GBLUP in a Chinese yellow-feathered chicken population[J]. Livest Sci, 2019, 230: 103817. DOI:10.1016/j.livsci.2019.103817
[13]	LOPEZ B I, VITERBO V, SONG C W, et al. Estimation of genetic parameters and accuracy of genomic prediction for production traits in Duroc pigs[J]. Czech J Anim Sci, 2019, 64(4): 160-165. DOI:10.17221/150/2018-CJAS
[14]	SONG H L, ZHANG J X, ZHANG Q, et al. Using different single-step strategies to improve the efficiency of genomic prediction on body measurement traits in pig[J]. Front Genet, 2019, 9: 730. DOI:10.3389/fgene.2018.00730
[15]	FISHER R A. The correlation between relatives on the supposition of Mendelian inheritance[J]. Trans Roy Soc Edinb, 1918, 52(2): 399-433.
[16]	HENDERSON C R. Estimation of variance and covariance components[J]. Biometrics, 1953, 9(2): 226-252. DOI:10.2307/3001853
[17]	HARTLEY H O, RAO J N K. Maximum-likelihood estimation for the mixed analysis of variance model[J]. Biometrika, 1967, 54(1-2): 93-108. DOI:10.1093/biomet/54.1-2.93
[18]	PATTERSON H D, THOMPSON R. Recovery of inter-block information when block sizes are unequal[J]. Biometrika, 1971, 58(3): 545-554. DOI:10.1093/biomet/58.3.545
[19]	BRITO LOPES F, MAGNABOSCO C U, PASSAFARO T L, et al. Improving genomic prediction accuracy for meat tenderness in Nellore cattle using artificial neural networks[J]. J Anim Breed Genet, 2020, 137(5): 438-448. DOI:10.1111/jbg.12468
[20]	HOWARD R, CARRIQUIRY A L, BEAVIS W D. Parametric and nonparametric statistical methods for genomic selection of traits with additive and epistatic genetic architectures[J]. G3 (Bethesda), 2014, 4(6): 1027-1046. DOI:10.1534/g3.114.010298
[21]	SARGOLZAEI M, SCHENKEL F S. QMSim: a large-scale genome simulator for livestock[J]. Bioinformatics, 2009, 25(5): 680-681. DOI:10.1093/bioinformatics/btp045
[22]	ANDONOV S, LOURENCO D A L, FRAGOMENI B O, et al. Accuracy of breeding values in small genotyped populations using different sources of external information—a simulation study[J]. J Dairy Sci, 2017, 100(1): 395-401. DOI:10.3168/jds.2016-11335
[23]	MADSEN P, SORENSEN P, SU G, et al. DMU-a package for analyzing multivariate mixed models[C]//8th World Congress on Genetics Applied to Livestock Production. Belo Horizonte: WCGALP, 2006: 247.
[24]	NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning. Haifa: ACM, 2010: 807-814.
[25]	VIJAYAKUMAR K, KADAM V J, SHARMA S K. Breast cancer diagnosis using multiple activation deep neural network[J]. Concurrent Eng, 2021, 29(3): 275-284. DOI:10.1177/1063293X211025105
[26]	DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. J Mach Learn Res, 2011, 12: 2121-2159.
[27]	DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm[J]. J Roy Stat Soc Ser B Methodol, 1977, 39(1): 1-22.
[28]	JENSEN J, MANTYSAARI E A, MADSEN P, et al. Residual maximum likelihood estimation of (co) variance components in multivariate mixed linear models using average information[J]. J Indian Soc Agric Stat, 1997, 49: 215-236.
[29]	OLDEN J D, JOY M K, DEATH R G. An accurate comparison of methods for quantifying variable importance in artificial neural networks using simulated data[J]. Ecol Modell, 2004, 178(3-4): 389-397. DOI:10.1016/j.ecolmodel.2004.03.013
[30]	RATHER M A, BASHIR I, HAMDANI A, et al. Prediction of body weight from linear body measurements in Kashmir Merino sheep[J]. Adv Anim Vet Sci, 2021, 9(2): 189-193.
[31]	KUMAR A, MISRA S S, SHARMA R C, et al. Genetic parameters for sex ratio in an organised sheep farm[J]. Ind J Small Rum, 2021, 27(1): 31-36. DOI:10.5958/0973-9718.2021.00014.3
[32]	POCRNIC I, DZIDIC A. Inheritance of the birth weights in crosses between Istrian, Awassi, East-Friesian and Travnik Pramenka sheep in Croatia: a case study[J]. J Cent Eur Agric, 2021, 22(2): 250-259. DOI:10.5513/JCEA01/22.2.3117
[33]	TAKEDA M, INOUE K, OYAMA H, et al. Exploring the size of reference population for expected accuracy of genomic prediction using simulated and real data in Japanese Black cattle[J]. BMC Genomics, 2021, 22(1): 799. DOI:10.1186/s12864-021-08121-z
[34]	ALKIMIM E R, CAIXETA E T, SOUSA T V, et al. High-throughput targeted genotyping using next-generation sequencing applied in Coffea canephora breeding[J]. Euphytica, 2018, 214(3): 50. DOI:10.1007/s10681-018-2126-2
[35]	GHOLIZADEH M, HAFEZIAN S H, NOSRATI M. Estimating heritabilities and breeding values for real and predicted milk production in Holstein dairy cows with artificial neural network and multiple linear regression models[J]. Iran J Appl Anim Sci, 2021, 11(1): 67-78.
[36]	COELHO DE SOUSA I, NASCIMENTO M, DE CASTRO SANT 'ANNA I, et al. Marker effects and heritability estimates using additive-dominance genomic architectures via artificial neural networks in Coffea canephora[J]. PLoS One, 2022, 17(1): e0262055. DOI:10.1371/journal.pone.0262055
[37]	PÉREZ-RODRÍGUEZ P, GIANOLA D, GONZÁLEZ-CAMACHO J M, et al. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat[J]. G3 (Bethesda), 2012, 2(12): 1595-1605. DOI:10.1534/g3.112.003665
[38]	MCDOWELL R. Genomic selection with deep neural networks[D]. Ames: Iowa State University, 2016.
[39]	LUAN T, WOOLLIAMS J A, LIEN S, et al. The accuracy of genomic selection in Norwegian red cattle assessed by cross-validation[J]. Genetics, 2009, 183(3): 1119-1126. DOI:10.1534/genetics.109.107391
[40]	DAETWYLER H D, PONG-WONG R, VILLANUEVA B, et al. The impact of genetic architecture on genome-wide evaluation methods[J]. Genetics, 2010, 185(3): 1021-1031. DOI:10.1534/genetics.110.116855
[41]	张猛. 西门塔尔牛部分经济性状全基因组选择的初步研究[D]. 北京: 中国农业科学院, 2011. ZHANG M. Genomic selection on some economical traits in Simmental[D]. Beijing: Chinese Academy of Agricultural Sciences, 2011. (in Chinese)
[42]	WOLC A, DEKKERS J C M. Application of Bayesian genomic prediction methods to genome-wide association analyses[J]. Genet Sel Evol, 2022, 54(1): 31. DOI:10.1186/s12711-022-00724-8
[43]	MELNIKOVA E, KABANOV A, NIKITIN S, et al. Application of genomic data for reliability improvement of pig breeding value estimates[J]. Animals (Basel), 2021, 11(6): 1557.
[44]	TERAKADO A P N, COSTA R B, IRANO N, et al. Comparison of methods for predicting genomic breeding values for growth traits in Nellore cattle[J]. Trop Anim Health Prod, 2021, 53(3): 349. DOI:10.1007/s11250-021-02785-1
[45]	COSTA W G D, CELERI M D O, BARBOSA I D P, et al. Genomic prediction through machine learning and neural networks for traits with epistasis[J]. Comput Struct Biotechnol J, 2022, 20: 5490-5499. DOI:10.1016/j.csbj.2022.09.029
[46]	CROSSA J, PÉREZ-RODRÍGUEZ P, CUEVAS J, et al. Genomic selection in plant breeding: methods, models, and perspectives[J]. Trends Plant Sci, 2017, 22(11): 961-975. DOI:10.1016/j.tplants.2017.08.011
[47]	BELLOT P, DE LOS CAMPOS G, PÉREZ-ENCISO M. Can deep learning improve genomic prediction of complex human traits?[J]. Genetics, 2018, 210(3): 809-819. DOI:10.1534/genetics.118.301298
[48]	MEUWISSEN T, GODDARD M. Accurate prediction of genetic values for complex traits by whole-genome resequencing[J]. Genetics, 2010, 185(2): 623-631. DOI:10.1534/genetics.110.116590
[49]	ERBE M, HAYES B J, MATUKUMALLI L K, et al. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels[J]. J Dairy Sci, 2012, 95(7): 4114-4129. DOI:10.3168/jds.2011-5019


畜牧兽医学报 2023, Vol. 54 Issue (7): 2824-2835. DOI: 10.11843/j.issn.0366-6964.2023.07.015	PDF