基于交叉验证技术的KNN方法在降水预报中的试验

引用本文

曾晓青, 邵明轩, 王式功, 刘还珠. 基于交叉验证技术的KNN方法在降水预报中的试验[J]. 应用气象学报, 2008, 19(4): 471-478. 复制到剪切板

Zeng Xiaoqing, Shao Mingxuan, Wang Shigong, Liu Huanzhu. Forecasting Precipitation Experiment with KNN Based on Crossing Verification Technology[J]. Journal of Applied Meteorological Science, 2008, 19(4): 471-478 复制到剪切板

基于交叉验证技术的KNN方法在降水预报中的试验

曾晓青¹, 邵明轩², 王式功¹, 刘还珠²

1. 兰州大学大气科学学院, 兰州 730000;
2. 国家气象中心, 北京 100081

2007-10-18 收到, 2008-03-27 收到修改稿.

资助项目: 国家自然科学基金项目(40675077)、中国气象局“精细化客观天气预报开发”课题和国家科技支撑计划项目(2007BAC29B03)共同资助

摘要: 利用2003—2005年4—9月国家气象中心T213的数值预报产品，通过动力诊断，从大量数值预报因子中提取不同层次、不同时效与降水关系较好的多个因子，使用K最邻近域（KNN）方法，制作不同代表站点的晴雨预报和大于或等于10 mm的降水预报试验。在搜索K邻近域的过程中，考虑天气事件出现的概率不同，而分别求取有天气事件的正样本K+值和无天气事件的负样本K^-值，使该方法选择的最邻近域中的K值取得更为合理。利用交叉验证的方法，对历史资料依次选取部分样本作为预报测试集，通过预测结果的检验评分，选取获得最大准确率和最大概括率的K⁺和K^-作为最佳邻近域的组合。确定了最优K值后，反算历史样本，通过比较，得到某站出现降水天气事件的预报判别值，在一定程度上减少了预报的空报率。经过对2006年4—9月的预报试验，改进后的KNN方法使24，48 h的晴雨预报和大于或等于10 mm降水预报的TS评分大多数高于未改进前的，也高于T213模式本身的降水预报和MOS方法动力统计释用的降水预报，特别是克服了模式降水预报和MOS方法预报中空报率较高的现象，达到了较好的预报效果。

关键词: K邻近域正负样本交叉验证降水预报

Forecasting Precipitation Experiment with KNN Based on Crossing Verification Technology

Zeng Xiaoqing¹, Shao Mingxuan², Wang Shigong¹, Liu Huanzhu²

1. Atmospheric Science School, Lanzhou University, Lanzhou 730000;
2. National Meteorological Center, Beijing 100081

Abstract: In order to improve objective precipitation forecasting level, non parameter estimate technology is used in research in application and interpretation of numerical prediction products. T213 numerical prediction products from national meteorological center are used as primary data from April to September during 2003 to 2005. By diagnostic analysis and Stepwise Regression, 10—20 factors are selected frommany factors of different levels and various times. The factors from numerical prediction products are well relevant to the rain observation precipitation data. An improved K-nearest neighbor approach (KNN) is used to forecast precipitation and that more than 10 mm at dissimilar area stations from April to September in 2006. In searching K-nearest neighbor process, different types of weather events such as rain free days, drizzle days and moderate rain days, have diverse probability. Then, the different K (K⁺ and K^-) values are computed to match the different weather events. The number of exiting weather event is represented by the value of K⁺. The number of no weather event is represented by the value of K^-. It is reasonable for different weather event to use KNN method. Forecasting and test patterns are selected in turn from history patterns by crossing verification method. Forecasting and test pat terns are replaced by other ones in historical patterns. Until all historical patterns are gone through thoroughly as forecasting and test patterns before an accuracy rate and a summary rate of forecasting are computed. To reduce the rate of miss forecast and to put the main emphasis on accuracy rate and summary rate of forecasting, the values of K⁺ and K^- are continually adjusted. Different accuracy rate and summary rate of forecasting can be computed for different K⁺ and K^- value. The result of tentative forecasting is compared. When both the accuracy rate and summary rate of forecasting are comparatively better, one optimal K is selected from a number of the accuracy rates and the summary rates of forecasting, which are corresponded with optimal K⁺ and K^-. After K⁺ and K_- are chosen, historical patterns are revised. The forecasting and distinguishing value of some stations is computed by comparing the results. To a certain extent, the rate of false forecasting decreases. Based on the forecasting experimentation from April 1st to September 30th in 2006 to forecast 24 hour and 48 hour qualitative prediction of 0 mm and 10 mm precipitation in different area stations, the improved KNN approach obtains a much higher technical score than KNN approach used before. The forecasting results of the improved KNN method are compared with the results of direct model output (DMO) and the result of MOS precipitation prediction. KNN approach gets more technical score than that of DMO and MOS, especially the rate of false forecasting of KNN approach sharply decreases, which is superior to DMO and MOS precipitation forecast, and better than KNN approach used before. It is a useful model for the actual operational forecasting of precipitation.

Key words: KNN positive and negative pattern cross validation precipitation forecast

[1]	刘还珠, 赵声蓉, 赵翠光, 等. 国家气象中心气象要素的客观预报——MOS系统. 应用气象学报, 2004, 15, (2): 181–191.
[2]	陆如华, 何于班. 卡尔曼滤波方法在天气预报中的应用. 气象, 1994, 20, (9): 41–46.
[3]	林健玲, 金龙, 彭海燕. 区域降水数值预报产品人工神经网络释用预报研究. 气象科技, 2006, 34, (1): 12–17.
[4]	刘还珠, 汤桂生. 暴雨落区预报实用方法. 北京: 气象出版社, 2000: 103-107.
[5]	黄嘉佑. 气象统计分析与预报方法. 北京: 气象出版社, 2000: 103-107.
[6]	刘爱鸣, 潘宁, 邹燕, 等. 福建前汛期区域暴雨客观预报模型研究. 应用气象学报, 2003, 14, (4): 419–429.
[7]	岳彩军, 寿亦萱, 寿绍文. 湿Q矢量释用技术及其在定量降水预报中应用研究. 应用气象学报, 2007, 18, (5): 666–675.
[8]	赵声蓉, 裴海英. 客观定量预报中降水的预处理问题. 应用气象学报, 2007, 18, (1): 21–28.
[9]	陈力强, 韩秀君, 张立祥. 基于MM5模式的站点降水预报释用方法研究. 气象科技, 2005, 31, (5): 268–272.
[10]	Cover T M, Hart P E, Nearest neighbor pattern classification. IEEE Trans on Inf Theory, 1967, 13: 21–27. DOI:10.1109/TIT.1967.1053964
[11]	翟宇梅, 赵瑞星. 概率天气预报的K近邻非参数估计仿真模型. 系统仿真学报, 2005, 17, (4): 786–788.
[12]	邵明轩, 刘还珠, 窦以文. 用非参数估计技术预报风的研究. 应用气象学报, 2006, 17, (增刊): 125–129.
[13]	车军辉, 李德生, 李玉华. 数值预报产品释用业务系统历史数据存储与检索. 应用气象学报, 2006, 17, (增刊): 152–156.
[14]	Bjarne K Harksen, Denis Riordan. Weather Prediction Using Casebased Reasoning and Fuzzy Set Theory. Master of Computer Science Thesis, Technical University of Nova Scotia, Halifax, Nova Scotia, Canada, 2001.
[15]	郑烇, 王俊普, 蔡庆生. 一种基于时间范例的预测技术. 南京大学学报(自然科学), 2003, 39, (2): 159–164.


图 1. 交叉验证示意图 Fig 1. The sketch map of cross validation


图 2. 2006年4-9月各方法晴雨预报检验评分对比 (a)24 h TS评分,(b)24 h空报率,(c)24 h概括率,(d)48 h TS评分,(e)48 h空报率,(f)48 h概括率 Fig 2. Comparisons of results from 4 methods to prediction of 0 mm from Apr to Sep in 2006 (a)24 h TS,(b)24 h empty rate,(c)24 h summary rate,(d)48 h TS,(e)48 h empty rate,(f)48 h summary rate


图 3. 2006年4-9月各方法大于或等于10 mm降水预报检验评分对比 (a)24 h TS评分,(b)24 h空报率,(c)24 h概括率,(d)48 h TS评分,(e)48 h空报率,(f)48 h概括率 Fig 3. Comparisons of results from 4 methods to prediction of more than 10 mm from Apr to Sep in 2006 (a)24 h TS,(b)24 h empty rate,(c)24 h summary rate,(d)48 h TS,(e)48 h empty rate,(f)48 h summary rate