«上一篇
 文章快速检索 高级检索

 智能系统学报  2018, Vol. 13 Issue (6): 994-998  DOI: 10.11992/tis.201806011 0

### 引用本文

ZHANG Meiqin, BAI Liang, WANG Junbin. Label propagation algorithm based on weighted clustering ensemble[J]. CAAI Transactions on Intelligent Systems, 2018, 13(6): 994-998. DOI: 10.11992/tis.201806011.

### 文章历史

1. 山西大学 计算机与信息技术学院，山西 太原 030006;
2. 山西大学 计算智能与中文信息处理教育部重点实验室，山西 太原 030006

Label propagation algorithm based on weighted clustering ensemble
ZHANG Meiqin1, BAI Liang2, WANG Junbin1
1. College of Computer Science and Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Symbol Computation and Knowledge Engineering (Shanxi University), Ministry of Education, Taiyuan 030006, China
Abstract: Label propagation algorithm (LPA) is one of the high-efficiency community detection algorithms for processing large-scale network data. It has attracted much attention because of its nearly linear time complexity with the number of nodes. However, in the algorithm, the label of each node depends on the labels of its neighbor nodes, which makes the iteration speed and clustering performance of the algorithm very sensitive to the order of label information update; this influences the accuracy and stability of the community detection result. To solve this problem, a new LPA is proposed based on weighted clustering ensemble. The new algorithm runs the LPAs many times to obtain several partition results, which can be regarded as a base clustering set. Furthermore, the modularity measure is used to evaluate the importance of each clustering. Based on the evaluation results, a weighted similarity measure is defined between nodes to obtain a weighted similarity matrix of pairwise nodes. Finally, hierarchical clustering on the similarity matrix is used to obtain a final community division result. In the experimental analysis, the new algorithm is compared with several other improved LPAs on five real representative network datasets. The experimental results show that the new algorithm is more effective for improving the community detection accuracy.
Key words: data mining    network data    community detection    label propagation algorithm    clustering ensemble    base clustering    modularity measure    weighted measure

1 基于加权聚类集成的标签传播算法 1.1 标签传播算法(LPA)

1)为所有节点初始化一个唯一的标签，每个标签代表一个社区。

2)产生随机遍历顺序遍历所有节点，每个节点选择其出现次数最多的邻居节点标签来更新自身的标签，以此达到标签的传播。

3)若存在节点的邻居节点标签数出现多个最大值，则随机选其中一个最大值所对应的标签来更新该节点的标签。重复2)，直到标签更新稳定，停止迭代。

4)将具有相同标签的节点划分为一个社区。

1.2 基于加权聚类集成的标签传播算法

 ${a_{ij}} = \left\{ {\begin{array}{*{20}{c}} {\begin{array}{*{20}{c}}\!\!\!\!\!\!\!\!1,&{ \left\langle { {v_i},{v_j} } \right\rangle \in { E}} \\ \!\!\!\!\!\!\!\! 0,&{ \left\langle { {v_i},{v_j} } \right\rangle \notin { E}} \end{array}} \end{array}} \right.，$ (1)

 ${{Q}} = \sum\nolimits_i {({e_{ii}} - b_i^2)}，$ (2)

$t$ 次运行标签传播算法所产生的单个聚类结果为一个基聚类，将多个标签传播算法的结果融合来代替单个标签传播算法的结果，使用聚类集成技术从结果集 ${{X}}$ 中发现最一致的结果。然而，聚类集成的结果会受到单个基聚类质量的影响，为了提高聚类结果的稳定性，因此提出加权相似性度量。

 ${{{S}}^{\mathop{w}\nolimits} }(i,j) = \sum\limits_{i,j = 1}^n {\sum\limits_{t = 1}^T {{{{Q}}_t}\sigma ({{{X}}_{it}},{{{X}}_{jt}})} }$ (3)

 Download: 图 1 算法框架 Fig. 1 The framework of the proposed algorithm

1)对网络 ${{G}}$ 中的节点集 ${{V}}$ 执行 $T$ 次标签传播算法运算，每次产生一列社区划分结果，最终形成 $n \times T$ 的基聚类集矩阵 ${{X}}$

2)根据定义2计算每次LPA结果的模块度值， ${{{Q}}_t}$ 来作为对应基聚类的权重。

3)根据定义4计算节点之间的加权相似性矩阵 ${{{S}}^w}$

4)对加权相似度矩阵 ${{{S}}^w}$ 采用层次聚类方法(linkage)产生最终的社区划分结果。

2 实验分析 2.1 实验数据

2.2 评价指标

 ${\rm NMI}({{A}},{{B}}) = \frac{{ - 2\displaystyle\sum\limits_{i = 1}^{{c_A}} {\sum\limits_{j = 1}^{{c_B}} {{c_{ij}} \cdot \log \frac{{{c_{ij}} \cdot n}}{{{b_i} \cdot b_j^\prime }}} } }}{{\displaystyle\sum\limits_{i = 1}^{{c_A}} {{b_i}\log (\frac{{{b_i}}}{n}) + \sum\limits_{j = 1}^{{c_B}} {{b_j}\prime \log (\frac{b_j^\prime }{n})} } }}，$ (4)

ARI的定义为

 ${\rm ARI} = \frac{{{r_0} - {r_s}}}{{\displaystyle\frac{{{r_1} - {r_2}}}{2} - {r_s}}}，$ (5)

2.3 实验结果分析

3 结束语

 [1] SARKAR P, MOORE A W. Dynamic social network analysis using latent space models[J]. ACM sigkdd explorations newsletter, 2005, 7(2): 31-40. DOI:10.1145/1117454 (0) [2] ZARE H, SHOOSHTARI P, GUPTA A, et al. Data reduction for spectral clustering to analyze high throughput flow cytometry data[J]. BMC bioinformatics, 2010, 11: 403. DOI:10.1186/1471-2105-11-403 (0) [3] SHI Jianbo, MALIK J. Normalized cuts and image segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2000, 22(8): 888-905. DOI:10.1109/34.868688 (0) [4] DJIDJEV H N, ONUS M. Scalable and accurate graph clustering and community structure detection[J]. IEEE transactions on parallel and distributed systems, 2013, 24(5): 1022-1029. DOI:10.1109/TPDS.2012.57 (0) [5] DUCH J, ARENAS A. Community detection in complex networks using extremal optimization[J]. Physical review E, 2005, 72(2): 027104. DOI:10.1103/PhysRevE.72.027104 (0) [6] NEWMAN M E J. Fast algorithm for detecting community structure in networks[J]. Physical review E, 2004, 69(6): 066133. DOI:10.1103/PhysRevE.69.066133 (0) [7] GIRVAN M, NEWMAN M E J. Community structure in social and biological networks[J]. Proceedings of the national academy of sciences of the United States of America, 2002, 99(12): 7821-7826. DOI:10.1073/pnas.122653799 (0) [8] NEWMAN M E J. Modularity and community structure in networks[J]. Proceedings of the national academy of sciences of the United States of America, 2006, 103(23): 8577-8582. DOI:10.1073/pnas.0601602103 (0) [9] TANG Lei, WANG Xufei, LIU Huan. Community detection via heterogeneous interaction analysis[J]. Data mining and knowledge discovery, 2012, 25(1): 1-33. (0) [10] RAGHAVAN U N, ALBERT R, KUMARA S. Near linear time algorithm to detect community structures in large-scale networks[J]. Physical review E, 2007, 76(3): 036106. DOI:10.1103/PhysRevE.76.036106 (0) [11] BARBER M J, CLARK J W. Detecting network communities by propagating labels under constraints[J]. Physical review E, 2009, 80(2): 026129. DOI:10.1103/PhysRevE.80.026129 (0) [12] LIU X, MURATA T. Advanced modularity-specialized label propagation algorithm for detecting communities in networks[J]. Physica A: statistical mechanics and its applications, 2010, 389(7): 1493-1500. DOI:10.1016/j.physa.2009.12.019 (0) [13] XIE Jierui, SZYMANSKI B K. Community detection using a neighborhood strength driven label propagation algorithm[C]//Proceedings of 2011 IEEE Network Science Workshop. West Point, NY, USA: IEEE, 2011: 188–195. (0) [14] HE Miao, LENG Mingwei, LI Fan, et al. A node importance based label propagation approach for community detection[M]//SUN Fuchun, LI Tianrui, LI Hongbo. Knowledge Engineering and Management: Proceedings of the Seventh International Conference on Intelligent Systems and Knowledge Engineering, Beijing, China, Dec 2012 (ISKE 2012). Berlin Heidelberg: Springer, 2014: 249–257. (0) [15] SUN Heli, LIU Jiao, HUANG Jianbin, et al. CenLP: a centrality-based label propagation algorithm for community detection in networks[J]. Physica A: statistical mechanics and its applications, 2015, 436: 767-780. DOI:10.1016/j.physa.2015.05.080 (0) [16] LI Wei, HUANG Ce, WANG Miao, et al. Stepping community detection algorithm based on label propagation and similarity[J]. Physica A: statistical mechanics and its applications, 2017, 472: 145-155. DOI:10.1016/j.physa.2017.01.030 (0) [17] (0) [18] BLONDEL V D, GUILLAUME J L, LAMBIOTTE R, et al. Fast unfolding of communities in large networks[J]. Journal of statistical mechanics: theory and experiment, 2008(10): 10008. (0) [19] ELHADARY R S, TOLBA A S, ELSHARKAWY M A, et al. An efficient and robust combined clustering technique for mining in large spatial databases[C]//Proceedings of 2007 International Conference on Computer Engineering and Systems. Cairo, Egypt, 2007: 439–445. (0) [20] FRED A L N, JAIN A K. Data clustering using evidence accumulation[C]//Proceedings of the 16th International Conference on Pattern Recognition. Quebec City, Quebec, Canada, 2002: 276–280. (0) [21] YANG Yan, KAMEL M S. An aggregated clustering approach using multi-ant colonies algorithms[J]. Pattern recognition, 2006, 39(7): 1278-1289. DOI:10.1016/j.patcog.2006.02.012 (0)