﻿ 基于混合距离学习的鲁棒的模糊<i>C</i>均值聚类算法
«上一篇
 文章快速检索 高级检索

 智能系统学报  2017, Vol. 12 Issue (4): 450-458  DOI: 10.11992/tis.201607019 0

### 引用本文

BIAN Zekang, WANG Shitong. Robust FCM clustering algorithm based on hybrid-distance learning[J]. CAAI Transactions on Intelligent Systems, 2017, 12(4): 450-458. DOI: 10.11992/tis.201607019.

### 文章历史

Robust FCM clustering algorithm based on hybrid-distance learning
BIAN Zekang, WANG Shitong
School of Digital Media, Jiangnan University, Wuxi 214122, China
Abstract: The distance metric plays a vital role in the fuzzy C-means clustering algorithm. In actual applications, there is a practical scenario in which the clustered data have a certain amount of side information, such as pairwise constraints with labels. To sufficiently utilize this side information, first, we propose a learning method based on hybrid distance, in which side information can be utilized to attain a distance metric formula for the data set. Next, we propose a robust fuzzy C-means clustering algorithm (HR-FCM algorithm) based on hybrid-distance learning, which is semi-supervised. The HR-FCM inherits the robustness of the GIFP-FCM (generalized FCM algorithm with improved fuzzy partitions) and has better clustering performance due to the more appropriate distance metric. The experimental results confirm the effectiveness of the proposed algorithm.
Key words: distance metric    FCM clustering algorithm    pairwise constraints    side information    hybrid distance    semi-supervised    GIFP-FCM    robustness

1 混合距离学习 1.1 混合距离

 $\begin{array}{*{20}{c}} {D\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right) = \sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right)} }\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\;\sum\limits_{i = 1}^p {{\omega _i} = 1} ,0 \le {\omega _i} \le 1,i = 1,2, \cdots ,p} \end{array}$ (1)

 $\begin{array}{*{20}{c}} {\mathop {\min }\limits_{{\omega _i},\beta } J = \frac{1}{2}\sum\limits_{i = 1}^p {\omega _i^2} - C\sum\limits_{k = 1}^n {{y_k}\left( {\sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {\mathit{\boldsymbol{x}}_a^k,\mathit{\boldsymbol{x}}_b^k} \right)} - \beta } \right)} }\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\;\;{y_k}\left( {\sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {\mathit{\boldsymbol{x}}_a^k,\mathit{\boldsymbol{x}}_b^k} \right)} - \beta } \right) > 0}\\ {\sum\limits_{i = 1}^p {{\omega _i} = 1} ,0 \le {\omega _i} \le 1,i = 1,2, \cdots ,p} \end{array}$ (2)

 $\begin{array}{*{20}{c}} {L = \frac{1}{2}\sum\limits_{i = 1}^p {\omega _i^2} - C\sum\limits_{k = 1}^{{n_p}} {{y_k}\left( {\sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {\mathit{\boldsymbol{x}}_a^k,\mathit{\boldsymbol{x}}_b^k} \right)} - \beta } \right)} + }\\ {\lambda \left( {1 - \sum\limits_{i = 1}^p {{\omega _i}} } \right) + \sum\limits_{i = 1}^p {{\varphi _i}{\omega _i}} } \end{array}$ (3)

 $\left\{ \begin{array}{l} \frac{{\partial L}}{{\partial {\omega _i}}} = 0\\ {\varphi _i} \ge 0\\ {\varphi _i}{\omega _i} = 0 \end{array} \right.$ (4)

 $\begin{array}{*{20}{c}} {L = \frac{1}{2}\sum\limits_{i = 1}^p {\omega _i^2} - C\sum\limits_{k = 1}^{{n_p}} {{y_k}\left( {\sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {\mathit{\boldsymbol{x}}_a^k,\mathit{\boldsymbol{x}}_b^k} \right)} - \beta } \right)} + }\\ {\lambda \left( {1 - \sum\limits_{i = 1}^p {{\omega _i}} } \right)} \end{array}$ (5)

 ${\omega _i} = \frac{1}{p} + C\sum\limits_{k = 1}^{{n_p}} {{y_k}d_i^k} - \frac{C}{p}\sum\limits_{j = 1}^p {\sum\limits_{k = 1}^{{n_p}} {{y_k}d_j^k} }$ (6)

 ${\omega _i} = \left\{ \begin{array}{l} 0,\;\;\;\;\;i \in {p^ - }\\ \frac{1}{{\left| {{p^ + }} \right|}} + C\sum\limits_{k = 1}^{{n_p}} {{y_k}d_i^k} - \frac{C}{{\left| {{p^ + }} \right|}}\sum\limits_{j = 1}^p {\sum\limits_{k = 1}^{{n_p}} {{y_k}d_j^k} } ,\;\;\;i \in {p^ + } \end{array} \right.$ (7)

 ${\nabla _\beta }J = C\sum\limits_{k = 1}^{{n_p}} {{y_k}}$ (8)

 ${\nabla _\beta }J = C\sum\limits_{k = 1}^{\left| {n_p^ + } \right|} {{y_k}}$ (9)

 ${\omega _i} = \left\{ \begin{array}{l} 0,\;\;\;\;\;i \in {p^ - }\\ \frac{1}{{\left| {{p^ + }} \right|}} + C{E_i},\;\;\;i \in {p^ + } \end{array} \right.$ (10)

 ${E_i} = \sum\limits_{k = 1}^{{n_p}} {{y_k}d_i^k} - \frac{1}{{\left| {{p^ + }} \right|}}\sum\limits_{j = {p^ + }} {\sum\limits_{k = n_p^ + }^{{n_p}} {{y_k}d_j^k} }$ (11)

1) 初始化p+=∅, p0={1, 2, …, p}, h=0;

2)h=h+1, ph+=ph－1++{i}, ph=ph－1－{i}, 其中$i = {\rm{arg}}\;\mathop {{\rm{max}}}\limits_{i \in p\bar h - 1} \left\{ {{E_i}} \right\}$;

3) 通过式(10) 计算ωg并判断其是否大于0。其中$g = {\rm{arg}}\;\mathop {{\rm{max}}}\limits_{i \in p\mathop h\limits^ + } \left\{ {{E_i}} \right\}$。如果ωg>0，则返回2)，否则设置ph+=ph－1+, ph=ph－1并终止。

1) 初始化：$\omega = {\omega ^{\left( 0 \right)}} = \frac{1}{p},\beta = {\beta ^{\left( 0 \right)}}$(在初始权值下取距离的最大值作为β的初值)。

2) 计算距离矩阵：D(i, k)，

3) 设置迭代步数：t=1，

4) 循环，直至收敛：

① 更新学习率：γ=1/t, t=t+1

② 更新训练子集：

 $n_p^ + = \left\{ {{y_k} \in D:{y_k}\left( {\sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {x_a^k,x_b^k} \right) - \beta } > 0} \right)} \right\}$

③ 计算梯度：

 ${\nabla _\beta }J = C\sum\limits_{k \in n_p^ + } {{y_k}}$

④ 更新阈值：β′=β-γβJ;

⑤ 更新集合p+p，使用算法1;

⑥ 更新ω

1.2 时间复杂度分析

2 基于混合距离学习的鲁棒的FCM算法

 $\begin{array}{*{20}{c}} {J = \sum\limits_{i = 1}^c {\sum\limits_{j = 1}^n {u_{ij}^md_p^2\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_i}} \right)} } + \sum\limits_{j = 1}^n {{a_j}} \sum\limits_{i = 1}^c {{u_{ij}}\left( {1 - u_{ij}^{m - 1}} \right)} \;{\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;m > 1}\\ {\sum\limits_{i = 1}^c {{u_{ij}} = 1} ,\;\;\;\;0 \le {u_{ij}} \le 1,\;\;\;1 \le i \le c,1 \le j \le n} \end{array}$ (12)

 ${\mathit{\boldsymbol{v}}_i} = \frac{{\sum\nolimits_{j = 1}^n {u_{ij}^m{\mathit{\boldsymbol{x}}_j}} }}{{\sum\nolimits_{j = 1}^n {u_{ij}^m} }}$ (13)
 ${u_{ij}} = \frac{1}{{\sum\nolimits_{k = 1}^c {{{\left( {\frac{{d_p^2\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_i}} \right) - \alpha \times \mathop {\min }\limits_{1 \le s \le c} d_p^2\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_s}} \right)}}{{d_p^2\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_k}} \right) - \alpha \times \mathop {\min }\limits_{1 \le s \le c} d_p^2\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_s}} \right)}}} \right)}^{\frac{1}{{m - 1}}}}} }}$ (14)
 ${d_p}\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_i}} \right) = \left( {\sum\limits_{k = 1}^n {{{\left| {{\mathit{\boldsymbol{x}}_j} - {\mathit{\boldsymbol{v}}_i}} \right|}^p}} } \right)\frac{1}{p}\;\;\;\;{\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\;1 < p < 2$ (15)

 ${v_i} = \frac{{\sum\nolimits_{j = 1}^n {u_{ij}^m{\mathit{\boldsymbol{x}}_j}} }}{{\sum\nolimits_{j = 1}^n {u_{ij}^m} }}$ (16)
 ${u_{ij}} = \frac{1}{{\sum\nolimits_{k = 1}^c {{{\left( {\frac{{{\mathit{\boldsymbol{D}}^2}\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_i}} \right) - \alpha \times \mathop {\min }\limits_{1 \le s \le c} {\mathit{\boldsymbol{D}}^2}\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_s}} \right)}}{{{\mathit{\boldsymbol{D}}^2}\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_k}} \right) - \alpha \times \mathop {\min }\limits_{1 \le s \le c} {\mathit{\boldsymbol{D}}^2}\left( {{\mathit{\boldsymbol{x}}_j},{\mathit{\boldsymbol{v}}_s}} \right)}}} \right)}^{\frac{1}{{m - 1}}}}} }}$ (17)

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{D}}\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right) = \sum\limits_{i = 1}^p {{\omega _i}{d_i}\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right)} }\\ {{\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\sum\limits_{i = 1}^p {{\omega _i} = 1} ,0 \le {\omega _i} \le 1} \end{array}$ (18)

1) 初始化：uij=uij1, t=1；

2) 使用式(16) 计算新的聚类中心vit+1

3) 使用式(17) 计算新的隶属矩阵uijt+1

4) 如果‖Ut+1Ut‖＜ε或者tT, 输出最终的隶属矩阵，否则t=t+1返回2)。

HR-FCM算法通过使用距离学习得到的新的混合距离代替传统FCM算法中的欧式距离，进一步增加了算法的抗噪性能。再者，通过数据集本身的辅助信息进行距离的学习的得到的混合距离，比原有的欧式距离更加适合数据集，提高了算法的适用性。HR-FCM算法与传统的FCM算法相比，具有更佳的聚类性能和鲁棒性。

3 实验研究和分析

3.1 实验设置和实验数据

 $\left\{ \begin{array}{l} {d_1}\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right) = {\left( {\mathit{\boldsymbol{x}} - \mathit{\boldsymbol{y}}} \right)^{\rm{T}}}\frac{1}{{{\sigma ^2}}}\left( {\mathit{\boldsymbol{x}} - \mathit{\boldsymbol{y}}} \right)\\ {d_2}\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right) = \sum\limits_{i = 1}^d {\frac{{\left| {{\mathit{\boldsymbol{x}}_i} - {\mathit{\boldsymbol{y}}_i}} \right|}}{{{\sigma ^2}}}} \\ {d_3}\left( {\mathit{\boldsymbol{x}},\mathit{\boldsymbol{y}}} \right) = 1 - \exp \left( { - \frac{{ - 3{{\left\| {\mathit{\boldsymbol{x}} - \mathit{\boldsymbol{y}}} \right\|}^2}}}{{{\sigma ^2}}}} \right) \end{array} \right.$ (19)

3.2 评价方法

 ${\rm{NMI}}\left( {X,Y} \right) = \frac{{I\left( {X,Y} \right)}}{{\sqrt {H\left( X \right) \cdot H\left( Y \right)} }}$ (20)
 ${\rm{RI}}\left( {X,Y} \right) = \frac{{a + b}}{{n \times \left( {n - 1} \right)/2}}$ (21)

3.3 实验结果和分析

 图 1 各数据集的RI值 Fig.1 The RI on the different data sets
 图 2 各数据集的NMI值 Fig.2 The NMI on the different data sets

4 结束语

 [1] 王骏, 王士同. 基于混合距离学习的双指数模糊C均值算法[J]. 软件学报, 2010, 21(8): 1878-1888. WANG Jun, WANG Shitong. Double indices FCM algorithm based on hybrid distance metric learning[J]. Journal of software, 2010, 21(8): 1878-1888. (0) [2] WU L, HOI S C H, JIN R, et al. Learning bregman distance functions for semi-supervised clustering[J]. IEEE transactions on knowledge and data engineering, 2012, 24(3): 478-491. DOI:10.1109/TKDE.2010.215 (0) [3] WU K L, YANG M S. Alternative c-means clustering algorithms[J]. Pattern recognition, 2002, 35(10): 2267-2278. DOI:10.1016/S0031-3203(01)00197-2 (0) [4] XING E P, NG A Y, JORDAN M I, et al. Distance metric learning, with application to clustering with side-information[J]. Advances in neural information processing systems, 2003, 15: 505-512. (0) [5] BAR-Hillel A, HERTZ T, SHENTAL N, et al. Learning a mahalanobis metric from equivalence constraints[J]. Journal of machine learning research, 2005, 6(6): 937-965. (0) [6] 郭瑛洁, 王士同, 许小龙. 基于最大间隔理论的组合距离学习算法[J]. 智能系统学报, 2015, 10(6): 843-850. (0) [7] YE J, ZHAO Z, LIU H. Adaptive distance metric learning for clustering[C]//IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, USA, 2007:1-7. (0) [8] WANG X, WANG Y, WANG L. Improving fuzzy c-means clustering based on feature-weight learning[J]. Pattern recognition letters, 2004, 25(10): 1123-1132. DOI:10.1016/j.patrec.2004.03.008 (0) [9] HE P, XU X, HU K, et al. Semi-supervised clustering via multi-level random walk[J]. Pattern recognition, 2014, 47(2): 820-832. DOI:10.1016/j.patcog.2013.07.023 (0) [10] HOI S C H, LIU W, LYU M R, et al. Learning distance metrics with contextual constraints for image retrieval[C]//IEEE Conference on Computer Vision and Pattern Recognition. New York, USA, 2006:2072-2078. (0) [11] 曾令伟, 伍振兴, 杜文才. 基于改进自监督学习群体智能(ISLCI)的高性能聚类算法[J]. 重庆邮电大学学报:自然科学版, 2016, 28(1): 131-137. ZENG Lingwei, WU Zhenxing, DU Wencai. Improved self supervised learning collection intelligence based high performance data clustering approach[J]. Journal of Chongqing university of posts and telecommunications:natural science edition, 2016, 28(1): 131-137. (0) [12] 程旸, 王士同. 基于局部保留投影的多可选聚类发掘算法[J]. 智能系统学报, 2016, 11(5): 600-607. CHENG Yang, WANG Shitong. A multiple alternative clusterings mining algorithm using locality preserving projections[J]. CAAI transactions on intelligent systems, 2016, 11(5): 600-607. (0) [13] DUDA R O, HART P E, STORK D G. Pattern classification[M]//Pattern classification. Wiley, 2001:119-131. (0) [14] MEI J P, CHEN L. Fuzzy clustering with weighted medoids for relational data[J]. Pattern recognition, 2010, 43(5): 1964-1974. DOI:10.1016/j.patcog.2009.12.007 (0) [15] HOPPNER F, KLAWONN F. Improved fuzzy partitions for fuzzy regression models[J]. International journal of approximate reasoning, 2003, 32(2/3): 85-102. (0) [16] ZHU L, CHUNG F L, WANG S. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE transactions on systems man and cybernetics part B, 2009, 39(3): 578-591. DOI:10.1109/TSMCB.2008.2004818 (0) [17] STREHL A, GHOSH J. Cluster ensembles-a knowledge reuse framework for combining multiple partitions[J]. Journal of machine learning research, 2002, 3(3): 583-617. (0) [18] IWAYAMA M, TOKUNAGA T. Hierarchical Bayesian clustering for automatic text classification[J]. IJCAI, 1996: 1322-1327. (0) [19] RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the american statistical association, 1971, 66(336): 846-850. DOI:10.1080/01621459.1971.10482356 (0)