猪SNP液相芯片10K~50K基因型填充效果研究

引用本文

陈宇, 邱奥, 张梓鹏, 都鹤鹤, 白俊艳, 王贵江, 罗文学, 倪俊卿, 李凯, 丁向东. 猪SNP液相芯片10K~50K基因型填充效果研究[J]. 畜牧兽医学报, 2022, 53(10): 3368-3376.

CHEN Yu, QIU Ao, ZHANG Zipeng, DU Hehe, BAI Junyan, WANG Guijiang, LUO Wenxue, NI Junqing, LI Kai, DING Xiangdong. Study on the Genotype Imputation Effect of 10K-50K Genotype of Pig SNP Liquid Chip[J]. Acta Veterinaria et Zootechnica Sinica, 2022, 53(10): 3368-3376.

陈宇^1,2, 邱奥², 张梓鹏², 都鹤鹤², 白俊艳¹, 王贵江³, 罗文学³, 倪俊卿³, 李凯⁴, 丁向东²

1. 河南科技大学动物科技学院, 洛阳 471000;
2. 中国农业大学动物科学技术学院畜禽育种国家工程实验室农业农村部动物遗传育种与繁殖重点实验室, 北京 100193;
3. 河北省畜牧良种工作总站, 石家庄 050061;
4. 河南省畜牧总站, 郑州 450008

收稿日期：2022-03-28

基金项目：河北省重点研发计划项目(19226376D)；财政部和农业农村部：国家现代农业产业技术体系(CARS-35)；国家重点研发计划项目(2019YFE0106800)

作者简介：陈宇(1998-)，男，安徽马鞍山人，硕士生，主要从事动物遗传育种与繁殖研究，E-mail：1970374577@qq.com.

通信作者：丁向东, 主要从事猪遗传育种与统计遗传学研究, E-mail: xding@cau.edu.cn.

摘要：旨在探究低密度液相芯片在生产实践中的实用性，降低育种成本。本试验选用了3 761头约160日龄，110 kg左右健康大白猪，随机抽取100头大白猪，根据10K芯片标记信息, 从50K芯片中抽取标记生成10K芯片，作为填充群体。再从剩余群体中，分别随机抽取800、2 000、3 600个个体作为参考群体，使用Beagle 4.1软件对100头填充群体进行基因型填充至50K芯片，重复10次，以基因型一致性和基因型相关系数来评价基因型填充的准确性。结果表明，10K和50K芯片平均连锁不平衡(r²)程度为0.227和0.258，相差不大。最小等位基因频率(MAF)为0.05是基因型填充准确性的拐点, 剔除掉MAF＜0.05标记后，填充准确性明显升高。填充准确性随参考群体规模增大而上升，参考群由800头扩大到3 600头，填充准确性从0.90提高到0.95，10次重复的标准差也从0.006下降到0.002。对于较小的参考群体规模，染色体基因型填充准确性波动较大，随着参考群体规模增大，每条染色体填充准确性相差不大。本研究结果表明，猪液相芯片从10K填充到50K是可行的，可以大规模用于基因组选择，降低基因组选择育种成本。

关键词：猪液相芯片基因型填充分子育种

Study on the Genotype Imputation Effect of 10K-50K Genotype of Pig SNP Liquid Chip

CHEN Yu^1,2, QIU Ao², ZHANG Zipeng², DU Hehe², BAI Junyan¹, WANG Guijiang³, LUO Wenxue³, NI Junqing³, LI Kai⁴, DING Xiangdong²

1. College of Animal Science and Technology, Henan University of Science and Technology, Luoyang 471000, China;
2. Key Laboratory of Animal Genetics And Breeding of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory of Animal Breeding, College of Animal Science and Technology, China Agricultural University, Beijing 100193, China;
3. Hebei Province Animal Husbandry and Improved Breeds Work Station, Shijiazhuang 050061, China;
4. Henan Province Animal Husbandry Station, Zhengzhou 450008, China

Corresponding author: DING Xiangdong, E-mail: xding@cau.edu.cn.

染色体号
Chromosome

10K

50K

标记数
Number of markers

平均间距/bp
Average distance

r²

标记数
Number of markers

平均间距/bp
Average distance

r²

Chr1

1 042

262 593

0.267

5 388

50 871

0.292

Chr2

662

228 606

0.239

3 191

47 438

0.289

Chr3

600

221 266

0.209

2 956

44 852

0.235

Chr4

623

210 220

0.219

3 179

41 144

0.270

Chr5

467

223 033

0.170

2 309

45 171

0.234

Chr6

661

258 673

0.192

2 889

59 127

0.268

Chr7

548

222 318

0.191

3 068

39 651

0.232

Chr8

601

231 336

0.263

2 999

46 298

0.246

Chr9

654

213 340

0.238

3 141

44 366

0.266

Chr10

284

244 420

0.196

1 868

37 049

0.230

Chr11

382

207 370

0.196

1 852

42 684

0.228

Chr12

287

213 056

0.315

1 727

35 304

0.277

Chr13

820

253 962

0.234

4 009

51 895

0.303

Chr14

664

213 624

0.275

3 499

40 490

0.294

Chr15

599

233 962

0.214

2 913

48 096

0.273

Chr16

375

211 907

0.232

1 883

42 111

0.243

Chr17

296

208 808

0.194

1 663

38 405

0.237

Chr18

258

216 590

0.249

1 352

41 202

0.223

合计/平均
Total/Average

9 823

226 394

0.227

49 886

44 231

0.258

[1]	叶绍潘. 基于全基因组测序数据的基因型填充准确性研究[D]. 广州: 华南农业大学, 2017. YE S P. Research on genotype imputation with whole-genome sequence data[D]. Guangzhou: South China Agricultural University, 2017. (in Chinese)
[2]	KUMAR S, BANKS T W, CLOUTIER S. SNP discovery through next-generation sequencing and its applications[J]. Int J Plant Genomics, 2012, 2012: 831460.
[3]	KLEIN R J, ZEISS C, CHEW E Y, et al. Complement factor H polymorphism in age-related macular degeneration[J]. Science, 2005, 308(5720): 385-389. DOI:10.1126/science.1109557
[4]	MEUWISSEN T H E, HAYES B J, GODDARD M E. Prediction of total genetic value using genome-wide dense marker maps[J]. Genetics, 2001, 157(4): 1819-1829. DOI:10.1093/genetics/157.4.1819
[5]	滕晓坤, 肖华胜. 基因芯片与高通量DNA测序技术前景分析[J]. 中国科学C辑: 生命科学, 2008, 38(10): 891-899. TENG X K, XIAO H S. Prospect analysis of gene chip and high-throughput DNA sequencing technology[J]. Science in China Series C: Life Sciences, 2008, 38(10): 891-899. DOI:10.3321/j.issn:1006-9259.2008.10.003 (in Chinese)
[6]	王珏, 刘成琨, 刘德武, 等. 基于不同密度SNP芯片在杜洛克公猪中的全基因组选择效果分析[J]. 中国畜牧杂志, 2019, 55(12): 75-79. WANG J, LIU C K, LIU D W, et al. Analysis of genomic selection based on SNP data of various density chips in Duroc male pig population[J]. Chinese Journal of Animal Science, 2019, 55(12): 75-79. DOI:10.19556/j.0258-7033.20190927-02 (in Chinese)
[7]	ZHANG Z, DRUET T. Marker imputation with low-density marker panels in Dutch Holstein cattle[J]. J Dairy Sci, 2010, 93(11): 5487-5494. DOI:10.3168/jds.2010-3501
[8]	WEIGEL K A, VAN TASSELL C P, O'CONNELL J R, et al. Prediction of unobserved single nucleotide polymorphism genotypes of Jersey cattle using reference panels and population-based imputation algorithms[J]. J Dairy Sci, 2010, 93(5): 2229-2238. DOI:10.3168/jds.2009-2849
[9]	MARCHINI J, HOWIE B. Genotype imputation for genome-wide association studies[J]. Nat Rev Genet, 2010, 11(7): 499-511. DOI:10.1038/nrg2796
[10]	何桑, 丁向东, 张勤. 基因型填充方法介绍及比较[J]. 中国畜牧杂志, 2013, 49(23): 95-100. HE S, DING X D, ZHANG Q. Comparison of different genotype imputation methods[J]. Chinese Journal of Animal Science, 2013, 49(23): 95-100. DOI:10.3969/j.issn.0258-7033.2013.23.022 (in Chinese)
[11]	BECKER T, KNAPP M. Maximum-likelihood estimation of haplotype frequencies in nuclear families[J]. Genet Epidemiol, 2004, 27(1): 21-32. DOI:10.1002/gepi.10323
[12]	SCHEET P, STEPHENS M. A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase[J]. Am J Hum Genet, 2006, 78(4): 629-644. DOI:10.1086/502802
[13]	HOWIE B N, DONNELLY P, MARCHINI J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies[J]. PLoS Genet, 2009, 5(6): e1000529. DOI:10.1371/journal.pgen.1000529
[14]	BROWNING S R, BROWNING B L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering[J]. Am J Hum Genet, 2007, 81(5): 1084-1097. DOI:10.1086/521987
[15]	PURCELL S, NEALE B, TODD-BROWN K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses[J]. Am J Hum Genet, 2007, 81(3): 559-575. DOI:10.1086/519795
[16]	VANRADEN P M, O'CONNELL J R, WIGGANS G R, et al. Genomic evaluations with many more genotypes[J]. Genet Sel Evol, 2011, 43(1): 10. DOI:10.1186/1297-9686-43-10
[17]	SARGOLZAEI M, CHESNAIS J P, SCHENKEL F S. Accuracy of a family-based genotype imputation algorithm[C]//The 32nd Conference for the International Society for Animal Genetics. Edinburgh: ISAG, 2010.
[18]	HICKEY J M, KINGHORN B P, TIER B, et al. A combined long-range phasing and long haplotype imputation method to impute phase for SNP genotypes[J]. Genet Sel Evol, 2011, 43(1): 12. DOI:10.1186/1297-9686-43-12
[19]	NICOLAZZI E L, BIFFANI S, JANSEN G. Short communication: imputing genotypes using PedImpute fast algorithm combining pedigree and population information[J]. J Dairy Sci, 2013, 96(4): 2649-2653. DOI:10.3168/jds.2012-6062
[20]	阳文攀, 叶绍潘, 叶浩强, 等. 参考群筛选方法及规模对基因型填充准确性的影响[J]. 畜牧兽医学报, 2021, 52(12): 3357-3365. YANG W P, YE S P, YE H Q, et al. Effect of reference population selection method and size on genotype imputation accuracy[J]. Acta Veterinaria et Zootechnica Sinica, 2021, 52(12): 3357-3365. DOI:10.11843/j.issn.0366-6964.2021.012.004 (in Chinese)
[21]	邱奥, 王雪, 孟庆利, 等. 3款猪50K SNP芯片基因型填充效果研究[J]. 中国畜牧杂志, 2021, 57(S1): 33-38. QIU A, WANG X, MENG Q L, et al. Impact of genotype imputation using the 50K SNP chip in three pigs[J]. Chinese Journal of Animal Science, 2021, 57(S1): 33-38. (in Chinese)
[22]	徐云碧, 杨泉女, 郑洪建, 等. 靶向测序基因型检测(GBTS)技术及其应用[J]. 中国农业科学, 2020, 53(15): 2983-3004. XU Y B, YANG Q N, ZHENG H J, et al. Genotyping by target sequencing (GBTS) and its applications[J]. Scientia Agricultura Sinica, 2020, 53(15): 2983-3004. DOI:10.3864/j.issn.0578-1752.2020.15.001 (in Chinese)
[23]	李欢, 张文洋, 田志强, 等. 高通量分子标记检测方法的研究进展[J]. 玉米科学, 2022, 30(3): 1-9. LI H, ZHANG W Y, TIAN Z Q, et al. Research progress of high-throughput molecular marker detection methods[J]. Journal of Maize Sciences, 2022, 30(3): 1-9. (in Chinese)
[24]	HE S, WANG S, FU W, et al. Imputation of missing genotypes from low-to high-density SNP panel in different population designs[J]. Anim Genet, 2015, 46(1): 1-7. DOI:10.1111/age.12236
[25]	HICKEY J M, CROSSA J, BABU R, et al. Factors affecting the accuracy of genotype imputation in populations from several maize breeding programs[J]. Crop Sci, 2012, 52(2): 654-663. DOI:10.2135/cropsci2011.07.0358
[26]	曾浩南, 钟展明, 徐志婷, 等. 3款猪50KSNP芯片基因型填充至序列数据的效果研究[J]. 华南农业大学学报, 2022(4): 1-10. ZENG H N, ZHONG Z M, XU Z T, et al. Research on genotype imputation of three 50K SNP chips from chip data to sequencing data[J]. Journal of South China Agricultural University, 2022(4): 1-10. (in Chinese)
[27]	GHOREISHIFAR S M, MORADI-SHAHRBABAK H, MORADI-SHAHRBABAK M, et al. Accuracy of imputation of single-nucleotide polymorphism marker genotypes for water buffaloes (Bubalus bubalis) using different reference population sizes and imputation tools[J]. Livest Sci, 2018, 216: 174-182. DOI:10.1016/j.livsci.2018.08.009
[28]	BUTTY A M, SARGOLZAEI M, MIGLIOR F, et al. Optimizing selection of the reference population for genotype imputation from array to sequence variants[J]. Front Genet, 2019, 10: 510. DOI:10.3389/fgene.2019.00510
[29]	SONG H L, YE S P, JIANG Y F, et al. Using imputation-based whole-genome sequencing data to improve the accuracy of genomic prediction for combined populations in pigs[J]. Genet Sel Evol, 2019, 51(1): 58. DOI:10.1186/s12711-019-0500-8
[30]	LEE D, KIM Y, CHUNG Y, et al. Accuracy of genotype imputation based on reference population size and marker density in Hanwoo cattle[J]. J Anim Sci Technol, 2021, 63(6): 1232-1246. DOI:10.5187/jast.2021.e117
[31]	WENG Z, ZHANG Z, ZHANG Q, et al. Comparison of different imputation methods from low- to high-density panels using Chinese Holstein cattle[J]. Animal, 2013, 7(5): 729-735. DOI:10.1017/S1751731112002224
[32]	BADKE Y M, BATES R O, ERNST C W, et al. Accuracy of estimation of genomic breeding values in pigs using low-density genotypes and imputation[J]. G3 (Bethesda), 2014, 4(4): 623-631. DOI:10.1534/g3.114.010504
[33]	XIANG T, MA P P, OSTERSEN T, et al. Imputation of genotypes in Danish purebred and two-way crossbred pigs using low-density panels[J]. Genet Sel Evol, 2015, 47(1): 54. DOI:10.1186/s12711-015-0134-4
[34]	RUBINACCI S, RIBEIRO D M, HOFMEISTER R J, et al. Efficient phasing and imputation of low-coverage sequencing data using large reference panels[J]. Nat Genet, 2021, 53(1): 120-126. DOI:10.1038/s41588-020-00756-0
[35]	DAVIES R W, KUCKA M, SU D W, et al. Rapid genotype imputation from sequence with reference panels[J]. Nat Genet, 2021, 53(7): 1104-1111. DOI:10.1038/s41588-021-00877-0
[36]	KREINER-MØLLER E, MEDINA-GOMEZ C, UITTERLINDEN A G, et al. Improving accuracy of rare variant imputation with a two-step imputation approach[J]. Eur J Hum Genet, 2015, 23(3): 395-400. DOI:10.1038/ejhg.2014.91
[37]	马裴裴. 通过基因型填充提高全基因组预测的方法和策略研究[D]. 北京: 中国农业大学, 2013. MA P P. Methods and strategies to impute missing genotypes for improving genomic prediction[D]. Beijing: China Agricultural University, 2013. (in Chinese)
[38]	HEIDARITABAR M, CALUS M P L, VEREIJKEN A, et al. Accuracy of imputation using the most common sires as reference population in layer chickens[J]. BMC Genet, 2015, 16(1): 101. DOI:10.1186/s12863-015-0253-5
[39]	ZHENG H F, RONG J J, LIU M, et al. Performance of genotype imputation for low frequency and rare variants from the 1000 genomes[J]. PLoS One, 2015, 10(1): e0116487. DOI:10.1371/journal.pone.0116487
[40]	PAUSCH H, AIGNER B, EMMERLING R, et al. Imputation of high-density genotypes in the Fleckvieh cattle population[J]. Genet Sel Evol, 2013, 45(1): 3. DOI:10.1186/1297-9686-45-3


畜牧兽医学报 2022, Vol. 53 Issue (10): 3368-3376. DOI: 10.11843/j.issn.0366-6964.2022.10.010	PDF