参考群筛选方法及规模对基因型填充准确性的影响

引用本文

阳文攀, 叶绍潘, 叶浩强, 林清, 魏趁, 张志刚, 张细权, 陈赞谋, 张哲. 参考群筛选方法及规模对基因型填充准确性的影响[J]. 畜牧兽医学报, 2021, 52(12): 3357-3365.

YANG Wenpan, YE Shaopan, YE Haoqiang, LIN Qing, WEI Chen, ZHANG Zhigang, ZHANG Xiquan, CHEN Zanmou, ZHANG Zhe. Effect of Reference Population Selection Method and Size on Genotype Imputation Accuracy[J]. Acta Veterinaria et Zootechnica Sinica, 2021, 52(12): 3357-3365.

阳文攀^1,2, 叶绍潘^1,4, 叶浩强¹, 林清¹, 魏趁¹, 张志刚³, 张细权¹, 陈赞谋¹, 张哲¹

1. 华南农业大学动物科学学院国家生猪种业工程技术研究中心, 广州 510642;
2. 福建傲农生物科技集团股份有限公司, 漳州 363000;
3. 厦门银祥集团有限公司肉食品安全生产技术国家重点实验室, 厦门 361100;
4. 汕头大学理学院, 广东省海洋生物技术重点实验室, 汕头 515063

收稿日期：2021-04-30

基金项目：财政部和农业农村部：国家现代农业产业技术体系资助

作者简介：阳文攀(1994-), 男, 湖北汉川人, 硕士生, 主要从事动物遗传育种研究, E-mail: oywpan@163.com.

通信作者：张哲, 主要从事动物遗传育种研究, E-mail: zhezhang@scau.edu.cn.

摘要：为探究基于A矩阵期望遗传关系最大化（maximizing the expected genetic relationship for matrix A，RELA）、基于A矩阵目标群体遗传方差最小化（minimized the target population genetic variance for matrix A，MCA）、平均亲缘关系最大化（the highest mean kinship coefficients，KIN）、随机选择（random selection，RAN）、共同祖先筛选（common ancestor，CA）等不同参考群筛选方法及参考群规模对基因型填充准确性的影响。本研究使用矮小型黄羽肉鸡作为试验群体，采用鸡600K SNP芯片（Affymetrix Axion HD genotyping array）进行基因分型，测定435羽子代公鸡45、56、70、84、91日龄体重。利用Beagle软件将低密度SNP芯片填充为高密度SNP芯片数据，比较不同参考群筛选方法、参考群规模对基因型填充准确性的影响，以及填充芯片基因组预测准确性。结果表明，使用Beagle 4.0结合系谱信息进行填充效果最佳，其次为Beagle 4.0，而Beagle 5.1填充效果最差。使用MCA方法筛选参考群进行基因型填充准确性最高，使用RAN方法筛选参考群进行基因型填充准确性最低，MCA、RELA、CA 3种方法基因型填充准确性差别较小。相比其他方法，使用MCA方法筛选个体作为参考群将低密度SNP芯片填充至高密度SNP芯片进行基因组选择的预测准确性较高，与真实高密度SNP芯片的基因组预测准确性相差甚微。随着参考群规模增大，基因型填充准确性也随之增加，但增速逐渐下降，最后趋于平缓。综上所述，可以通过参考群筛选方法构建参考群以及控制参考群规模，以保证基因型填充和基因组预测准确性并节省成本，本研究为基因型填充在畜禽遗传育种中的应用提供技术参考。

关键词：鸡基因型填充参考群筛选方法参考群规模填充准确性

Effect of Reference Population Selection Method and Size on Genotype Imputation Accuracy

YANG Wenpan^1,2, YE Shaopan^1,4, YE Haoqiang¹, LIN Qing¹, WEI Chen¹, ZHANG Zhigang³, ZHANG Xiquan¹, CHEN Zanmou¹, ZHANG Zhe¹

1. National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, Guangzhou 510642, China;
2. Fujian Aonong Biological Science and Technology Group Co. Ltd., Zhangzhou 363000, China;
3. State Key Laboratory of Food Safety Technology for Meat Products, Xiamen Yinxiang Group Co. Ltd., Xiamen 361100, China;
4. Guangdong Provincial Key Laboratory of Marine Biotechnology, College of Science, Shantou University, Shantou 515063, China

Corresponding author: ZHANG Zhe, E-mail: zhezhang@scau.edu.cn.

方法
Method

数目
Number

45日龄体重
BW45

56日龄体重
BW56

70日龄体重
BW70

84日龄体重
BW84

91日龄体重
BW91

RAN

0.279 5±0.123 4

0.302 3±0.134 6

0.279 7±0.143 1

0.321 2±0.157 5

0.335 9±0.145 7

0.298 4±0.124 8

0.320 4±0.132 9

0.282 6±0.145 5

0.327 3±0.162 3

0.327 6±0.156 3

0.304 4±0.124 6

0.313 0±0.138 3^*

0.285 7±0.147 0

0.331 3±0.166 5

0.328 7±0.161 3

MCA

0.305 8±0.128 8^*

0.329 8±0.140 7

0.299 6±0.152 4

0.346 1±0.169 1

0.339 9±0.158 8

0.305 7±0.123 1

0.319 6±0.137 1

0.296 7±0.145 7

0.342 5±0.165 5

0.337 0±0.160 0

0.307 7±0.123 6

0.317 1±0.136 4

0.291 9±0.145 0^*

0.336 1±0.163 8^*

0.331 9±0.158 6^*

RAW

0.306 7±0.122 8

0.314 5±0.136 7

0.291 2±0.144 0

0.334 3±0.164 7

0.332 2±0.159 6

黑体表示预测效果最好的芯片；^*. 填充芯片预测结果最接近真实芯片预测结果；RAW表示真实芯片数据。下同
The chips with the best predictions are in bold; ^*. The predicted result of the imputation chip is the closest to the predicted result of the real chip; RAW represents real chip data. The same as below

[1]	WIGGANS G R, VANRADEN P M, COOPER T A. The genomic evaluation system in the United States: past, present, future[J]. J Dairy Sci, 2011, 94(6): 3202-3211. DOI:10.3168/jds.2010-3866
[2]	WEISHAAR R, WELLMANN R, CAMARINHA-SILVA A, et al. Selecting the hologenome to breed for an improved feed efficiency in pigs—a novel selection index[J]. J Anim Breed Genet, 2020, 137(1): 14-22. DOI:10.1111/jbg.12447
[3]	GAO N, TENG J Y, PAN R Y, et al. Accuracy of whole genome prediction with single-step GBLUP in a Chinese yellow-feathered chicken population[J]. Livest Sci, 2019, 230: 103817. DOI:10.1016/j.livsci.2019.103817
[4]	ZENGER K R, KHATKAR M S, JONES D B, et al. Genomic selection in aquaculture: application, limitations and opportunities with special reference to marine shrimp and pearl oysters[J]. Front Genet, 2019, 9: 693. DOI:10.3389/fgene.2018.00693
[5]	DAETWYLER H D, VILLANUEVA B, WOOLLIAMS J A. Accuracy of predicting the genetic risk of disease using a genome-wide approach[J]. PLoS One, 2008, 3(10): e3395. DOI:10.1371/journal.pone.0003395
[6]	HAYES B J, BOWMAN P J, CHAMBERLAIN A J, et al. Invited review: genomic selection in dairy cattle: progress and challenges[J]. J Dairy Sci, 2009, 92(2): 433-443. DOI:10.3168/jds.2008-1646
[7]	DAS S, FORER L, SCHÖNHERR S, et al. Next-generation genotype imputation service and methods[J]. Nat Genet, 2016, 48(10): 1284-1287. DOI:10.1038/ng.3656
[8]	SOLLERO B P, HOWARD J T, SPANGLER M L. The impact of reducing the frequency of animals genotyped at higher density on imputation and prediction accuracies using ssGBLUP1[J]. J Anim Sci, 2019, 97(7): 2780-2792. DOI:10.1093/jas/skz147
[9]	FERNANDES JÚNIOR G A, CARVALHEIRO R, DE OLIVEIRA H N, et al. Imputation accuracy to whole-genome sequence in Nellore cattle[J]. Genet Sel Evol, 2021, 53(1): 27. DOI:10.1186/s12711-021-00622-5
[10]	VAN BINSBERGEN R, BINK M C, CALUS M P, et al. Accuracy of imputation to whole-genome sequence data in Holstein Friesian cattle[J]. Genet Sel Evol, 2014, 46(1): 41. DOI:10.1186/1297-9686-46-41
[11]	VAN DEN BERG S, VANDENPLAS J, VAN EEUWIJK F A, et al. Imputation to whole-genome sequence using multiple pig populations and its use in genome-wide association studies[J]. Genet Sel Evol, 2019, 51(1): 2. DOI:10.1186/s12711-019-0445-y
[12]	YE S, YUAN X, HUANG S, et al. Comparison of genotype imputation strategies using a combined reference panel for chicken population[J]. Animal, 2019, 13(6): 1119-1126. DOI:10.1017/S1751731118002860
[13]	VANRADEN P M. Symposium review: how to implement genomic selection[J]. J Dairy Sci, 2020, 103(6): 5291-5301. DOI:10.3168/jds.2019-17684
[14]	RUBINACCI S, DELANEAU O, MARCHINI J. Genotype imputation using the positional burrows wheeler transform[J]. PLoS Genet, 2020, 16(11): e1009049. DOI:10.1371/journal.pgen.1009049
[15]	ROS-FREIXEDES R, WHALEN A, GORJANC G, et al. Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling[J]. Genet Sel Evol, 2020, 52(1): 18. DOI:10.1186/s12711-020-00537-7
[16]	YE S P, YUAN X L, LIN X R, et al. Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population[J]. J Anim Sci Biotechnol, 2018, 9: 30. DOI:10.1186/s40104-018-0241-5
[17]	SARGOLZAEI M, CHESNAIS J P, SCHENKEL F S. A new approach for efficient genotype imputation using information from relatives[J]. BMC Genomics, 2014, 15(1): 478. DOI:10.1186/1471-2164-15-478
[18]	YU X J, WOOLLIAMS J A, MEUWISSEN T H. Prioritizing animals for dense genotyping in order to impute missing genotypes of sparsely genotyped animals[J]. Genet Sel Evol, 2014, 46(1): 46. DOI:10.1186/1297-9686-46-46
[19]	DRUET T, MACLEOD I M, HAYES B J. Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions[J]. Heredity (Edinb), 2014, 112(1): 39-47. DOI:10.1038/hdy.2013.13
[20]	PURCELL S, NEALE B, TODD-BROWN K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses[J]. Am J Hum Genet, 2007, 81(3): 559-575. DOI:10.1086/519795
[21]	BROWNING S R, BROWNING B L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering[J]. Am J Hum Genet, 2007, 81(5): 1084-1097. DOI:10.1086/521987
[22]	R Core Team. R: a language and environment for statistical computing[J]. Vienna, Austria: R Foundation for Statistical Computing, 2020.
[23]	BROWNING B L, ZHOU Y, BROWNING S R. A one-penny imputed genome from next-generation reference panels[J]. Am J Hum Genet, 2018, 103(3): 338-348. DOI:10.1016/j.ajhg.2018.07.015
[24]	ENDELMAN J B. Ridge regression and other kernels for genomic selection with R package rrBLUP[J]. Plant Genome, 2011, 4(3): 250-255. DOI:10.3835/plantgenome2011.08.0024
[25]	POOK T, MAYER M, GEIBEL J, et al. Improving imputation quality in BEAGLE for crop and livestock data[J]. G3 (Bethesda), 2020, 10(1): 177-188. DOI:10.1534/g3.119.400798
[26]	WHALEN A, HICKEY J M. AlphaImpute2: fast and accurate pedigree and population based imputation for hundreds of thousands of individuals in livestock populations[J/OL]. bioRxiv, 2020, doi: 10.1101/2020.09.16.299677.
[27]	HOWIE B N, DONNELLY P, MARCHINI J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies[J]. PLoS Genet, 2009, 5(6).
[28]	LIU E Y, LI M Y, WANG W, et al. MaCH-admix: genotype imputation for admixed populations[J]. Genet Epidemiol, 2013, 37(1): 25-37. DOI:10.1002/gepi.21690
[29]	WANG X, SU G S, HAO D, et al. Comparisons of improved genomic predictions generated by different imputation methods for genotyping by sequencing data in livestock populations[J]. J Anim Sci Biotechnol, 2020, 11: 3. DOI:10.1186/s40104-019-0407-9
[30]	PAUSCH H, AIGNER B, EMMERLING R, et al. Imputation of high-density genotypes in the Fleckvieh cattle population[J]. Genet Sel Evol, 2013, 45(1): 3. DOI:10.1186/1297-9686-45-3
[31]	TSAIRIDOU S, HAMILTON A, ROBLEDO D, et al. Optimizing low-cost genotyping and imputation strategies for genomic selection in atlantic salmon[J]. G3 (Bethesda), 2020, 10(2): 581-590. DOI:10.1534/g3.119.400800
[32]	BICKHART D M, HUTCHISON J L, NULL D J, et al. Reducing animal sequencing redundancy by preferentially selecting animals with low-frequency haplotypes[J]. J Dairy Sci, 2016, 99(7): 5526-5534. DOI:10.3168/jds.2015-10347
[33]	LEGARRA A, AGUILAR I, MISZTAL I. A relationship matrix including full pedigree and genomic information[J]. J Dairy Sci, 2009, 92(9): 4656-4663. DOI:10.3168/jds.2009-2061
[34]	MEUWISSEN T H E, LUAN T, WOOLLIAMS J A. The unified approach to the use of genomic and pedigree information in genomic evaluations revisited[J]. J Anim Breed Genet, 2011, 128(6): 429-439. DOI:10.1111/j.1439-0388.2011.00966.x
[35]	WANG H Y, WOODWARD B, BAUCK S, et al. Imputation of missing SNP genotypes using low density panels[J]. Livest Sci, 2012, 146(1): 80-83. DOI:10.1016/j.livsci.2011.12.010
[36]	GHOREISHIFAR S M, MORADI-SHAHRBABAK H, MORADI-SHAHRBABAK M, et al. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools[J]. Livest Sci, 2018, 216: 174-182. DOI:10.1016/j.livsci.2018.08.009
[37]	BUTTY A M, SARGOLZAEI M, MIGLIOR F, et al. Optimizing selection of the reference population for genotype imputation from array to sequence variants[J]. Front Genet, 2019, 10: 510. DOI:10.3389/fgene.2019.00510
[38]	邓天宇, 杜立新, 王立贤, 等. 基因型填充策略研究[J]. 畜牧兽医学报, 2020, 51(9): 2068-2078. DENG T Y, DU L X, WANG L X, et al. Study on the strategies of genotype imputation[J]. Acta Veterinaria et Zootechnica Sinica, 2020, 51(9): 2068-2078. (in Chinese)
[39]	YANG W Q, YANG Y B, ZHAO C C, et al. Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation[J]. Nucleic Acids Res, 2020, 48(D1): D659-D667. DOI:10.1093/nar/gkz854
[40]	GAO Y J, YANG Z Q, YANG W Q, et al. Plant-ImputeDB: an integrated multiple plant reference panel database for genotype imputation[J]. Nucleic Acids Res, 2021, 49(D1): D1480-D1488. DOI:10.1093/nar/gkaa953


畜牧兽医学报 2021, Vol. 52 Issue (12): 3357-3365. DOI: 10.11843/j.issn.0366-6964.2021.012.004	PDF