﻿ 基于支持向量机的浙江省流感样病例预警模型研究
 文章快速检索 高级检索

1. 浙江大学公共卫生学院流行病与卫生统计学系, 浙江 杭州 310058;
2. 浙江省疾病预防控制中心, 浙江 杭州 310051

Construction of early warning model of influenza-like illness in Zhejiang Province based on support vector machine
LU Han-ti1, LI Fu-dong2, LIN Jun-fen2, HE Fan2 , SHEN Yi1
1. Department of Epidemiology and Biostatistics, Zhejiang University School of Public Health, Hangzhou 310058, China;
2. The Center for Disease Control and Prevention of Zhejiang Province, Hangzhou 310051, China
Abstract:Objective: To construct a forecasting model of influenza-like illness in Zhejiang Province. Methods: The number of influenza-like cases and related pathogens among outpatients and emergency patients were obtained from 11 sentinel hospitals in Zhejiang Province during 2012 to 2013 (total 104 weeks), and corresponding meteorological factors were also collected. The epidemiological characteristics of influenza during the period were then analyzed. Linear correlation and rank correlation analyses were conducted to explore the association between influenza-like illness and related factors. Optimal parameters were selected by cross validation. Support vector machine was used to construct the forecasting model of influenza-like illness in Zhejiang Province and verified by the historical data. Results: Correlation analysis indicated that 8 factors were associated with influenza-like illness occurred in one week. The results of cross validation showed that the optimal parameters were C=3, ε=0.009 and γ=0.4. The results of influenza-like illness forecasting model after verification revealed that support vector machine had the accuracy of 50.0% for prediction with the same level, while it reached 96.7% for prediction within the range of one level higher or lower. Conclusion: Support vector machine is suitable for early warning of influenza-like illness.
Key words: Influenza, human/epidemiology    Artificial intelligence    Models, statistical    Forecasting/methods

1 资料与方法 1.1 流感和ILI相关疾病资料

1.2 气象资料

1.3 病原检测资料

1.4 数据预处理

1.5 ILI与其他因素的相关性分析

1.6 建立ILI预警模型

1.7 统计学软件

2 结 果 2.1 ILI报告病例基本情况

104周内11家医院ILI报告病例数244 313人次。其中患者年龄在5岁以下的病例数为180 409人次，占57.59%；5岁及以上15岁以下的病例数为71 336人次，占22.77%；15岁及以上25岁以下的病例数为17 374人次，占5.55%；25岁及以上60岁以下的病例数为35 902人次，占11.46%；患者年龄在60岁及以上的病例数为8223人次，占2.63%。

2.2 ILI例数与其他因素的相关性分析

 因 素 滞后0周 滞后1周 滞后2周 滞后3周 滞后4周 流感相关疾病例数 0.825** 0.756** 0.626** 0.438** 0.303** 周平均气压 -0.035 0.006 0.043 0.097 0.156 周平均风速 -0.091 -0.148 -0.200* -0.246* -0.192 周平均气温 -0.124 -0.199* -0.247* -0.302** -0.357** 周平均水汽压 -0.042 -0.126 -0.197* -0.250* -0.320** 周平均相对湿度 0.007 0.122 0.132 0.220* 0.198* 周最低气温 -0.101 -0.165 -0.212* -0.260** -0.313** 周最高气温 -0.076 -0.145 -0.162 -0.240* -0.288** 周平均温差 0.060 -0.077 -0.081 -0.097 -0.132 病原检出阳性率 0.485** 0.458** 0.418** 0.372** 0.324** *P<0.05；**P<0.01.
2.3 ILI预警模型的拟合调试

 (γ=0.1，ε=0.1) C值 均方误差 平方相关系数 0.001 0.0290 0.0289 0.01 0.0281 0.1359 0.1 0.0200 0.4227 1 0.0134 0.5421 2 0.0122 0.5788 3 0.0121 0.5804 4 0.0123 0.5708 5 0.0125 0.5662 10 0.0131 0.5459 100 0.0202 0.3816

 (C=3，γ=0.1) ε 值 均方误差 平方相关系数 0.0001 0.0115 0.6078 0.001 0.0114 0.6097 0.006 0.0113 0.6125 0.007 0.0112 0.6146 0.008 0.0111 0.6171 0.009 0.0111 0.6176 0.01 0.0111 0.6174 0.1 0.0121 0.5804 1 0.0357 0.0889 10 0.0357 0.0889 100 0.0357 0.0889

 (C=3，ε=0.009) γ值 均方误差 平方相关系数 0.0001 0.0281 0.2244 0.001 0.0248 0.2701 0.01 0.0140 0.5212 0.1 0.0111 0.6176 0.2 0.0115 0.6039 0.3 0.0113 0.6135 0.4 0.0111 0.6198 0.5 0.0117 0.6015 1 0.0123 0.5901 10 0.0142 0.5067 100 0.0266 0.1180

2.4 ILI预警模型的验证

 周次 实际ILI例数 预测ILI例数 实际流感等级 预测流感等级 75 3717 2926 3 2 76 3620 2954 3 2 77 3537 2927 3 2 78 4045 3131 3 2 79 4185 3265 4 2 80 3359 3112 3 2 81 2785 2881 2 2 82 2823 2405 2 1 83 2490 2097 1 1 84 2589 2126 1 1 85 2508 2377 1 1 86 2220 2038 1 1 87 2263 1978 1 1 88 1865 1949 1 1 89 2451 2062 1 1 90 2304 2383 1 1 91 2407 2397 1 1 92 2524 2838 1 2 93 2265 2972 1 2 94 2356 2661 1 2 95 2450 2934 1 2 96 2649 3330 1 2 97 2634 3285 1 2 98 2392 2961 1 2 99 2740 3297 2 2 100 2963 3610 2 3 101 3650 3923 3 3 102 3900 3733 3 3 103 4645 4205 4 4 104 6005 4245 4 4

 图1 SVM模型的预测值与实际值比较 Fig.1 The comparison between predictive value and actual value
3 讨 论

SVM是一种基于统计学习理论的新型机器学习算法[7]。统计学习理论是目前针对小样本统计和预测学习的最佳理论和数学框架。在本研究中，我们共收集了两年共104周的数据，利用前74周数据训练模型，后30周数据验证模型，符合小样本的范畴，在这方面SVM方法是适用于本次研究的。其次，与小样本紧密联系的是高维问题。样本数的多少是相对的，在低维空间中小样本就可以描述整个样本空间；但是在高维空间，所需的样本数量会随着维数的增加而呈指数形式增长[8]，进而产生维数灾难。SVM方法通过引入核函数有效解决了这一问题，使得操作可以直接在输入空间进行而不必到潜在的高维空间。因此，在使用SVM研究问题时不需要事先对高维数据进行降维，所以可以很方便地处理高维数据问题。在本研究中，我们利用医院门急诊中与流感相关疾病的病例数、各类气象因素以及病原检出阳性率来拟合ILI预警模型，而这些因素与ILI例数之间并非只是简单的线性相关，可能还存在复杂的非线性相关。SVM通过“核映射”，把输入样本空间映射到高维的特征空间，在特征空间中进行线性回归来实现非线性处理[9, 10]。因此，SVM也能很好地处理非线性问题，同样适用于本次研究。

SVM能充分利用样本的分布特征，在使用时不需过多的先验信息，根据一部分训练样本就可构建判别函数。而且其算法最终转化为二次寻优问题，理论上来说得到的是全局最优解。这样一来可以有效地避免其他机器学习算法(如神经网络等)易陷入的局部极值问题[11, 12]：SVM通过引入核函数和非线性变换巧妙地解决了高维问题，使其算法的复杂性与样本的维数无关；二来加快了训练学习的速度。另外，它还能根据有限的样本资料在模型的复杂性和学习能力之间寻求最佳折衷，保证了模型具有良好的泛化性能[13]

 [1] 李兰娟. 传染病学[M]. 北京: 高等教育出版社, 2004:13. LI Lan-juan. Epidemiology[M]. Beijing: Higher Education Press, 2014:13. (in Chinese) [2] VAN-DIJK A, ARAMINI J, EDGE G, et al. Real-time surveillance for respiratory disease outbreaks, ontario, Canada[J]. Emerg Infect Dis, 2009,15(5):799-801. [3] CHRETIEN J P, TOMICH N E, GAYDOS J C, et al. Real-time public health surveillance for emergency preparedness[J]. Am J Public Health, 2009,99(8):1360-1363. [4] 林君芬, 方 乐, 方琼珊, 等. 浙江省2009年甲型H1N1流感流行特征研究[J]. 浙江预防医学, 2010:22(9):5-7. LIN Jun-fen, FANG Le, FANG Qiong-shan, et al. Epidemiological characteristics of novel influenza A (N1H1) in Zhejiang Province in year 2009. Zhejiang Journal of Preventive Medicine, 2010:22(9):5-7. (in Chinese) [5] 侯 岩. 中国卫生年鉴.2010[M]. 北京: 人民卫生出版社, 2011: 45. HOU Yan. Year Book of Health in the People's Republic of China. 2010[M]. Beijing: People's Health Publishing House,2011:45. (in Chinese) [6] 中华人民共和国卫生部. 2011中国卫生统计年鉴[M]. 北京: 中国协和医科大学出版社, 2011: 266-273. Ministry of Health of the People's Republic of China. 2011 China Health Statistical Yearbook [M]. Beijing: Peking Union Medical College Press,2011:266-273. (in Chinese) [7] 张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000,26(1):36-46. ZHANG Xue-gong. Introduction to statistical learning theory and support vector machines[J]. Acta Automatica Sinica, 2000,26(1):36-46. (in Chinese) [8] 李应红, 尉询楷. 支持向量机和神经网络的融合发展[J]. 空军工程大学学报(自然科学版), 2005,6(4):74-77. LI Ying-hong, WEI Xun-kai. Fusion development of support vector machine and neural networks[J]. Journal of Air Force Engineering University(Natural Science Edition), 2005,6(4):74-77. (in Chinese) [9] 李 佳, 王 黎, 马光文, 等. LS-SVM在径流预测中的应用[J]. 中国农村水利水电, 2008,(5):12-14. LI Jia, WANG Li, MA Guang-wen, et al. Application of least squares support vector machines in runoff forecast[J]. China Rural Water and Hydropower, 2008,(5):12-14. (in Chinese) [10] GRITIANINI N, SHAW E-TAYLOR J. An introduction to support vector machines[M]. Cambridge: Cambridge University Press, 2000. [11] 姜万禄, 刘庆平, 刘 涛. 神经网络学习算法存在的问题及对策[J]. 机床与液压, 2003,(5):29-32. JIANG Wan-lu, LIU Qing-ping, LIU Tao. Drawbacks of neural network learning algorithms and countermeasures[J]. Machine Tool & Hydraulics, 2003,(5):29-32. (in Chinese) [12] 卢 敏, 张展羽. 径流预测的支持向量机应用研究[J]. 中国农村水利水电, 2006,(2):50-52. LU Min, ZHANG Zhan-yu. Application of support vector machine in runoff forecast[J]. China Rural Water and Hydropower, 2006,(2):50-52. (in Chinese) [13] 徐劲力. 支持向量机在水质评价中的应用[J]. 中国农村水利水电, 2007,(3):11-13. XU Jin-li. Application of support vector machine in water quality evaluation[J]. China Rural Water and Hydropower, 2007,(3):11-13. (in Chinese)

文章信息

LU Han-ti, LI Fu-dong, LIN Jun-fen, HE Fan, SHEN Yi

Construction of early warning model of influenza-like illness in Zhejiang Province based on support vector machine

Journal of Zhejiang University(Medical Sciences), 2015, 44(6): 653-658.
http://dx.doi.org/10.3785/j.issn.1008-9292.2015.11.09