文章快速检索 高级检索

1. 闽南师范大学 计算机科学与工程系，福建 漳州 363000;
2. 闽南师范大学 数学与统计学院，福建 漳州 363000

K-nearest neighbor classification algorithm fusing neighborhood information
LIN Yaojin1, LI Jinjin1,2, CHEN Jinkun2, MA Zhouming2
1. Department of Computer Science and Engineering, Zhangzhou 363000, China ;
2. School of Mathematics and Statistics, Zhangzhou 363000, China
Abstract: Distance measurement is one of the important factors which affect the classification accuracy of the k nearest neighbor(KNN)algorithm. In this paper, an improved k nearest neighbor algorithm fusing neighborhood information is presented. Firstly, the concept of the instance neighborhood is defined and two criterions are presented according to neighborhood influence; then, the influence of the instance neighborhood is comprehensively considered when the distance between the testing instances and the training instances is computed. This algorithm can characterize the distance among instances more precisely, and enhance the stability of the KNN to some extent. This presented method was tested on the UCI datasets, and the results showed that this proposed technique is better than or equal to other classifiers, and it is more robust under the noise disturbance.
Key words: k-nearest neighbor     neighborhood information     classification learning     distance measurement     noise disturbance

k-近邻法是一种非常简单有效的分类算法，广泛应用于数据挖掘和模式识别的各个领域[1-3]。其基本思想是通过计算寻找训练集中距离待分类样本最近的k个邻居，然后基于它们的类别信息，依据投票的原则对待分类样本的类别进行判定。k-近邻算法的分类精度很大程度受影响于样本之间距离的度量。

1 k-近邻分类法

k-近邻分类法是一种非常简单有效的用于分类学习和函数逼近的算法。给定由n个样本标签对组成的数据集DD={(x1, c1), (x2, c2), …, (xn, cn)}，其分类的任务在于获取映射函数f, 使得能正确预测无标签样本。设Nk(x)为测试样本x的k-近邻集合，k-近邻分类法在于通过测试样本x的k-近邻进行大众数投票进行确定x的标签，其公式为

 (1)

2 融合邻域信息的k-近邻分类算法 2.1 邻域信息的影响

 图 1 样本邻域图 Fig. 1 Neighborhood of sample

2.2 样本邻域的定义

N维欧氏空间中，给定任意2点xi=(xi1, xi2, …, xiN)和xj=(xj1, xj2, …, xjN)，其距离为

 (2)

 (1)

1)

2)

 图 2 δ减小后的样本邻域图 Fig. 2 Neighborhood of sample with smaller δ

3 算法设计

1)初始化c(x)=φ

2)根据式(2)获取测试样本与训练样本的k/2个近邻R1

3)根据式(3)获取测试样本邻域与训练样本邻域的k/2个近邻R2

4)获得测试样本x的融合近邻集合R=R1R2后，即测试样本xk近邻Nk(x)；

5)根据式(1)获得测试样本x的类标签c(x)。

4 实验结果及分析

 数据集 Instances 特征数 类别 Heart 270 13 2 ICU 200 20 3 Rice 104 5 2 Sonar 208 60 2 Wdbc 569 30 2 wpbc 198 33 2

 数据集 KNN NEC CART LSVM FK3N Heart 82.59±6.06 80.00±6.10 74.07±6.30 83.33±5.31 84.44±6.00 ICU 92.61±2.25 86.29±17.85 79.40±31.64 92.56±4.27 93.61±3.12 Rice 81.69±10.30 83.07±10.96 82.07±11.70 78.16±8.10 82.98±10.90 Sonar 72.62±7.05 82.74±5.48 72.07±13.94 77.86±7.05 79.31±5.59 Wdbc 96.67±2.09 95.79±2.37 90.50±4.55 97.73±2.48 97.01±2.04 Wpbc 76.26±5.89 78.26±7.24 70.63±7.54 77.37±7.73 76.79±7.96 平均 83.74 84.35 78.12 84.50 85.69

 数据集 KNN NEC CART LSVM FK3N Heart 82.22±6.49 80.37±4.96 77.78±7.61 83.33±6.11 82.96±5.30 ICU 92.61±2.25 87.29±18.03 84.19±29.92 91.55±5.44 92.61±2.24 Rice 81.80±6.61 81.78±7.96 77.05±10.69 77.05±3.84 81.98±8.96 Sonar 71.64±11.21 78.28±7.20 69.21±11.35 77.38±6.98 77.52±9.11 Wdbc 94.96±2.49 94.56±3.35 93.16±3.74 97.03±2.01 94.68±2.89 Wpbc 73.29±7.42 74.66±11.25 71.11±9.89 76.32±5.61 74.79±8.51 平均 82.71 82.82 78.75 83.78 84.09

5 结束语

 [1] COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory , 1967 (13) : 21-27 [2] WU X D, KUMAR V, QUINLAN J R, et al. Top 10 algorithms in data mining[J]. Knowledge and Information Systems , 2008, 14 (1) : 1-37 DOI:10.1007/s10115-007-0114-2 [3] 吕锋, 杜妮, 文成林. 一种模糊-证据kNN分类方法[J]. 电子学报 , 2012, 40 (12) : 2930-2935 [4] WANG H. Nearest neighbors by neighborhood counting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2005, 28 (6) : 942-953 [5] WILSON D R, MARTINEZ T R. Improve heterogeneous distance functions[J]. Journal of Artificial Intelligence Research , 1997 (6) : 1-34 [6] HU Q H, ZHU P F, YANG Y B, et al. Large-margin nearest neighbor classifiers via sample weight learning[J]. Neurocomputing , 2011, 74 (4) : 656-660 DOI:10.1016/j.neucom.2010.09.006 [7] [8] KOHAVI R, LANGLEY P, YUNG Y. The utility of feature weighting in nearest neighbor algorithms[C]//Proceedings of the Ninth European Conference on Machine Learning. [S.l.], 1997. [9] SUN Y J. Iterative RELIEF for feature weighting: algorithms, theories, and applications[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence , 2007, 29 (6) : 1035-1051 DOI:10.1109/TPAMI.2007.1093 [10] MU Y, DING W, TAO D C. Local discriminative distance metrics ensemble learning[J]. Pattern Recognition , 2013, 46 (8) : 2337-2349 DOI:10.1016/j.patcog.2013.01.010 [11] SONG Y, HUANG J, ZHOU D, et al. IKNN: informative k-nearest neighbor pattern classification[C]//PKDD 2007. [S.l.], 2007: 248-264. [12] 林耀进, 胡学钢, 李慧宗. 基于用户群体影响的协同过滤推荐算法[J]. 情报学报 , 2013, 32 (3) : 299-350 [13] HU Q H, YU D R, XIE Z X. Neighborhood classifiers[J]. Expert Systems with Applications , 2008, 34 (2) : 866-876 DOI:10.1016/j.eswa.2006.10.043
DOI: 10.3969/j.issn.1673-4785.201307015

0

#### 文章信息

LIN Yaojin, LI Jinjin, CHEN Jinkun, MA Zhouming

K-nearest neighbor classification algorithm fusing neighborhood information

CAAI Transactions on Intelligent Systems, 2014, 9(2): 240-243
http://dx.doi.org/10.3969/j.issn.1673-4785.201307015