﻿ 融合并行混沌萤火虫算法的K-调和均值聚类
 文章快速检索 高级检索

K-harmonic means clustering merged with parallel chaotic firefly algorithm
ZHU Shuwei, ZHOU Zhiping, ZHANG Daowen
School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China
Abstract: The K-harmonic means algorithm(KHM) has the disadvantage of easily falling into a local optimum. To solve this problem, we propose a hybrid KHM based on an improved firefly algorithm(FA). In this paper, we combined raw FA-based searching with parallel chaotic FA-based elaborate searching. In the elaborate searching, we found the current best and second-best solutions using the FA, then we used an improved logistic map model combined with parallel chaotic optimization to search this area in order to enhance the searching ability of the algorithm. Finally, we used the improved FA to optimize the cluster centers obtained by the KHM. Experimental results demonstrate that the proposed algorithm not only had higher search precision for several test functions, but also improved the clustering accuracy and stability of six datasets, effectively avoiding being trapped into a local optimum.
Key words: K-harmonic means     local optimum     firefly algorithm     clustering     parallel chaotic optimization     chaotic local search     map model     diversity of population

1 算法概念与定义 1.1 K-调和均值算法

K-调和均值算法的原理基本上与K-means是相似的，不同的是其使用调和均值(harmonic means,HM)代替算术均值来计算目标函数，能够有效解决对初始类中心点选取的敏感性问题。假定数据集X=[x1 x2xn]包含n个数据，它们被划分到k个聚类簇，每个簇的中心用cj(j=1,2,…,k)表示，KHM的目标函数为[3]

1.2 萤火虫算法的相关定义

2 基于改进FA的K-调和均值聚类 2.1 并行混沌局部搜索策略改进的FA

 图 1 2种特殊情况的最优解与次优解局部搜索区域Fig. 1 Two particular types of local search region around the best and second best solutions

1)初始化萤火虫个体的位置并计算其对应的目标函数值Ii作为亮度，初始化迭代次数t=0，最大迭代次数设为Tmax1，粗搜索迭代次数Tmax11

2)执行FA不断更新亮度，最亮的个体即为当前最优解xpg，并且次优解为xps，若t>Tmax11，则采用PCLS在它们附近寻优作为细搜索。

3)设置当前混沌搜索次数l=0，在上文几个断点外的区域初始化混沌变量-1 < zij(0) < 1 (i=1,2,…,N; j=1,2,…,m)，N为并行变量数，m为单变量维数，则zij表示第i个并行变量的第j维。此外，中间变量矩阵为Y，PCLS最大迭代次数为Cmax

①考虑到大多数情况下xpg具有更好的搜索潜力，令，且,…,N，使用式(8)确定第l+1次迭代的混沌扰动变量zij(l+1)

②混沌变量与收缩因子βt成比例，通过混沌扰动产生N个新变量如式(10)所示。

③计算每个新变量所对应的目标函数值为f(yi(l+1))，并将Y(l+1)xpg(l+1)xps(l+1)合并，对这N+2个变量的适应度值进行排序，得到第l+1次迭代中的最优解xpg(l+1)和次优解xps(l+1)

l=l+1，若l < Cmax，转向①；否则转向4)。

4)t=t+1，若t < Tmax，转向2)，且随机选取一个萤火虫个体用3)中获得的xpg替换并更新其亮度；否则停止迭代，输出全局最优解。

2.2 提高种群多样性的策略

2.3 改进FA的收敛性分析及复杂度分析

2.4 KHM-PCLSFA算法流程

1)初始化算法的基本参数γ、 α、β、CmaxN、l并随机初始化萤火虫种群的位置。

2)根据萤火虫的位置计算其目标函数值作为亮度，初始化当前迭代次数gen=0。

3)执行PCLSFA进行搜索，迭代运行gen1次，求出当前的最优个体Gbest以及对应的最优目标函数值Fg，进入下一步操作。并且，选出占种群比例为nc%的最差个体并采用混沌重构法将其替换。

4)以Gbest为聚类中心执行KHM操作，迭代运行gen2次，得到目标函数值KHM(X,C)和聚类中心并将其转化为一维向量xKHM，若KHM(X,C) < Fg，则用xKHM代替Gbest，并以xKHM随机替换一个萤火虫。

5)gen=gen+1，若gen < Itermax，则转到3)继续执行，否则停止迭代得出聚类结果。

3 实验数据与分析 3.1 PCLSFA的性能测试

Ackley(xi[-30, 30])、Rosenbrock(xi[-2.048,2.048])、Rastrigin(xi∈[-5.12,5.12])、Griewank(xi[-600, 600]) 进行仿真测试，它们的最优解都是0。通过FA、采用串行CLS分别改进PSO和FA算法的CLSPSO[9]和CLSFA进行对比分析，以验证PCLSFA的收敛性能及寻优能力。各算法种群规模都为Npop=40，最大迭代数Tmax1=2 000，3种具有CLS机制的算法中取Cmax=10，C=5。考虑FA对不同函数的收敛性能不同，在CLSFA和PCLSFA前期执行粗搜索的迭代数也不相同，对f1f4Tmax11=500，对f2Tmax11=0，对f3Tmax11=200。CLSPSO中采用线性递减的惯性权重w[9]，且wmax=0.9，wmin=0.4，学习因子为c1=c2=1.496。FA型算法中统一设置γ=1，随机步长α随着迭代次数t的增加不断减小为

 图 2 各函数的收敛曲线Fig. 2 The convergence curves for test functions

 函数 算法 最小值 最大值 平均值 标准差 f 1 FACLSPSOCLSFAPCLSFA 3.763×10 -47.994×10 -151.139×10 -101.052×10 -10 6.981×10 -32.660 50.052 21.496×10 -10 3.084×10 -31.291 60.020 11.259×10 -10 1.962×10 -30.934 40.026 01.212×10 -11 f 2 FACLSPSOCLSFAPCLSFA 26.575 07.919×10 -40.030 22.208×10 -3 29.210 04.222 132.154 20.213 5 28.306 00.265 14.076 00.130 8 0.804 60.717 89.609 30.066 9 f 3 FACLSPSOCLSFAPCLSFA 20.895 059.697 010.861 76.574 8 47.820 0112.431 038.601 422.109 1 30.911 688.551 721.039 213.212 9 7.167 812.532 25.803 04.387 2 f 4 FACLSPSOCLSFAPCLSFA 3.089×10 -57.116×10 -141.112×10 -161.110×10 -16 3.441×10 -40.048 93.296×10 -45.541×10 -16 1.030×10 -49.674×10 -39.873×10 -53.331×10 -16 7.068×10 -50.011 71.508×10 -41.655×10 -16
3.2 KHM-PCLSFA的实验数据与分析

 数据集 类数 维数 数据个数 Iris 3 4 150 Ionosphere 2 33 351 Wine 3 13 178 Image 7 19 210 CMC 3 9 1473 Satellite 6 33 6435

 数据集 算法 KHM(X,C) F-measure 时间/s Iris KHMKHM-FAKHM-PSO本文算法 148.91148.84148.89148.83 0.8850.8850.8850.886 0.1761.3221.2272.236 Sphere KHMKHM-FAKHM-PSO本文算法 2 805.62 804.82 805.22 803.7 0.7060.7060.7060.706 0.6915.1085.1059.668 Wine KHMKHM-FAKHM-PSO本文算法 75 338 585.375 336 750.475 336 842.875 335 247.3 0.6890.7040.7010.707 0.2722.1852.2163.854 Image KHMKHM-FAKHM-PSO本文算法 54 906 284.340 937 350.740 178 361.829 926 352.6 0.5950.5970.5970.599 0.9216.6586.88312.034 CMC KHMKHM-FAKHM-PSO本文算法 96 201.4796 165.2396 185.2896 160.39 0.4650.4650.4640.465 1.98714.81414.92426.683 Satellite KHMKHM-FAKHM-PSO本文算法 1.954 3×10 81.953 6×10 81.953 7×10 81.953 3×10 8 0.7620.7750.7710.772 12.46292.11299.459175.71

 数据集 算法 KHM(X,C) F-measure 时间/s Iris KHMKHM-FAKHM-PSO本文算法 126.08125.86125.99125.81 0.8920.8920.8920.893 0.1811.2961.1982.195 Sphere KHMKHM-FAKHM-PSO本文算法 2648.02643.52646.52642.3 0.7000.7000.7000.700 0.6935.0875.0679.630 Wine KHMKHM-FAKHM-PSO本文算法 1.049 1×10 91.049 1×10 91.049 0×10 91.049 1×10 9 0.6220.6560.6320.658 0.2622.1932.1363.831 Image KHMKHM-FAKHM-PSO本文算法 1.790 6×10 81.783 5×10 81.788 4×10 81.771 3×10 8 0.5510.5600.5590.560 0.8766.5926.70312.210 CMC KHMKHM-FAKHM-PSO本文算法 187 018.21186 824.39186 943.89186 778.24 0.4580.4610.4600.464 1.93714.88514.86926.389 Satellite KHMKHM-FAKHM-PSO本文算法 8.805 4×10 88.802 7×10 88.802 8×10 88.796 5×10 8 0.7630.7760.7740.780 12.42891.25099.784175.18

 数据集 算法 KHM(X,C) F-measure 时间/s Iris KHMKHM-FAKHM-PSO本文算法 109.84109.53109.61109.22 0.8920.8920.8920.892 0.1781.3041.2352.311 Sphere KHMKHM-FAKHM-PSO本文算法 2 567.92 558.92 564.42 553.5 0.7000.7000.7000.700 0.6845.0965.0889.662 Wine KHMKHM-FAKHM-PSO本文算法 2.717 5×10 101.420 4×10 101.441 9×10 101.419 3×10 10 0.6300.6550.6360.661 0.2732.2012.1403.945 Image KHMKHM-FAKHM-PSO本文算法 7.089 9×10 92.324 2×10 91.668 5×10 91.494 6×10 9 0.5350.5370.5380.540 0.9286.6676.84312.162 CMC KHMKHM-FAKHM-PSO本文算法 380 733.23380 013.07380 381.58379 782.74 0.4550.4580.4580.460 1.96614.89214.85526.632 Satellite KHMKHM-FAKHM-PSO本文算法 4.171 4×10 94.163 0×10 94.150 6×10 94.159 7×10 9 0.7150.7210.7090.724 12.43792.17497.873175.75

4 结束语

 [1] JAIN A K. Data clustering:50 years beyond K-means[J]. Pattern Recognition Letters, 2010, 31(8):651-666. [2] ZHANG Bin, HSU M, DAYAL U. K-harmonic means-a data clustering algorithm. Technical Report HPL-1999-124[R]. Hewlett-Packard Laboratories, 1999. [3] YANG Fengqin, SUN Tieli, ZHANG Changhai. An efficient hybrid data clustering method based on K-harmonic means and particle swarm optimization[J]. Expert Systems with Applications, 2009, 36(6):9847-9852. [4] ALGUWAIZANI A, HANSEN P, MLADENOVIC N, et al. Variable neighborhood search for harmonic means clustering[J]. Applied Mathematical Modelling, 2011, 35(6):2688-2694. [5] HUNG C H, CHIOU H M, YANG Weining. Candidate groups search for K-harmonic means data clustering[J]. Applied Mathematical Modelling, 2013, 37(24):10123-10128. [6] 汪中, 刘贵全, 陈恩红. 基于模糊K-harmonic means的谱聚类算法[J]. 智能系统学报, 2009, 4(2):95-99. WANG Zhong, LIU Guiquan, CHEN Enhong. A spectral clustering algorithm based on fuzzy K-harmonic means[J]. CAAI Transactions on Intelligent Systems, 2009, 4(2):95-99. [7] WU Xiaohong, WU Bin, SUN Jun, et al. A hybrid fuzzy K-harmonic means clustering algorithm[J]. Applied Mathematical Modelling, 2015, 39(12):3398-3409. [8] 王建峰, 孙超, 姜守达. 基于粒子群优化的组合测试数据生成算法[J]. 哈尔滨工程大学学报, 2013, 34(4):477-482.WANG Jianfeng, SUN Chao, JIANG Shouda. Improved algorithm for combinatorial test data generation based on particle swarm optimization[J]. Journal of Harbin Engineering University, 2013, 34(4):477-482. [9] HE Yaoyao, YANG Shanlin, XU Qifa. Short-term cascaded hydroelectric system scheduling based on chaotic particle swarm optimization using improved logistic map[J]. Communications in Nonlinear Science and Numerical Simulation, 2013, 18(7):1746-1756. [10] HE Yaoyao, XU Qifa, YANG Shanlin, et al. A novel chaotic differential evolution algorithm for short-term cascaded hydroelectric system scheduling[J]. International Journal of Electrical Power & Energy Systems, 2014, 61:455-462. [11] 廖煜雷, 刘鹏, 王建, 等. 基于改进人工鱼群算法的无人艇控制参数优化[J]. 哈尔滨工程大学学报, 2014, 35(7):800-806.LIAO Yulei, LIU Peng, WANG Jian, et al. Control parameter optimization for the unmanned surface vehicle with the improved artificial fish swarm algorithm[J]. Journal of Harbin Engineering University, 2014, 35(7):800-806. [12] YANG Xinshe. Firefly algorithm, stochastic test functions and design optimisation[J]. International Journal of Bio-Inspired Computation, 2010, 2(2):78-84. [13] 赵玉新, YANG X S, 刘利强. 新兴元启发式优化方法[M]. 北京:科学出版社, 2013:148-157. [14] SENTHILNATH J, OMKAR S N, MANI V. Clustering using firefly algorithm:performance study[J]. Swarm and Evolutionary Computation, 2011, 1(3):164-171. [15] FISTER Jr I, PERC M, KAMAL S M. A review of chaos-based firefly algorithms:perspectives and research challenges[J]. Applied Mathematics and Computation, 2015, 252:155-165. [16] GANDOMI A H, YANG X S, TALATAHARI S, et al. Firefly algorithm with chaos[J]. Communications in Nonlinear Science and Numerical Simulation, 2013, 18(1):89-98. [17] YUAN Xiaofang, ZHAO Jingyi, YANG Yimin, et al. Hybrid parallel chaos optimization algorithm with harmony search algorithm[J]. Applied Soft Computing, 2014, 17:12-22. [18] YUAN Xiaofang, ZHANG Ting, XIANG Yongzhong, et al. Parallel chaos optimization algorithm with migration and merging operation[J]. Applied Soft Computing, 2015, 35:591-604. [19] YANG Dixiong, LIU Zhenjun, ZHOU Jilei. Chaos optimization algorithms based on chaotic maps with different probability distribution and search speed for global optimization[J]. Communications in Nonlinear Science and Numerical Simulation, 2014, 19(4):1229-1246. [20] 莫愿斌, 马彦追, 郑巧燕, 等. 单纯形法的改进萤火虫算法及其在非线性方程组求解中的应用[J]. 智能系统学报, 2014, 9(6):747-755. MO Yuanbin, MA Yanzhui, ZHENG Qiaoyan, et al. Improved firefly algorithm based on simplex method and its application in solving non-linear equation groups[J]. CAAI Transactions on Intelligent Systems, 2014, 9(6):747-755.
DOI: 10.11992.tis.201505043

0

#### 文章信息

ZHU Shuwei, ZHOU Zhiping, ZHANG Daowen

K-harmonic means clustering merged with parallel chaotic firefly algorithm

CAAI Transactions on Intelligent Systems, 2015, 10(6): 872-880.
DOI: 10.11992.tis.201505043