文章快速检索 高级检索

1. 山西大学 计算机与信息技术学院, 山西 太原 030006;
2. 山西大学 计算智能与中文信息处理教育部重点实验室, 山西 太原 030006

Community detection algorithm based on dense subgraphs
ZHENG Wenping1,2, ZHANG Haojie1, WANG Jie1,2
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computation Intelligence and Chinese Information Processing, Ministry of Education, Shanxi University, Taiyuan 030006, China
Abstract: The density-based graph clustering algorithm has been widely used in community detection. However, because it identifies a community by searching a partially dense subgraph in the network, many nodes do not constitute a dense subgraph and are therefore difficult to cluster. In this paper, we present a soft clustering algorithm based on dense subgraphs (BDSG) for detecting communities in complex networks. First, we propose a method for detecting the central communities. Next, we define the degree of community attribution of a node, and put forward a core community extended strategy. Finally, we obtain the clustering results of a network. Compared with the clique percolation method (CPM), k-dense algorithms from Zachary's Karate Club, the dolphin social network, the American college football network, the email network, and the collaboration network, BDSG shows considerably better performance with respect to modularity and time efficiency. In addition, the proposed core community extended strategy may improve the effectiveness of the clustering-methods-based density, such as that in CPM, k-dense algorithms, and others.
Key words: complex network     community detection     graph clustering     soft clustering     density     core extended strategy     vertex betweenness     modularity

﻿

1 背景知识

 $B\left( {{v}_{k}} \right)=\sum\limits_{i=1}^{n}{\sum\limits_{j=i+1}^{n}{\frac{{{L}_{vi,vj}}\left( vk \right)}{{{L}_{vi,vj}}}}}$ (1)

2 结点对社区的归属度定义

Cui[22]基于结点v与社区C关联边数定义了结点v对于社区C的归属度${{b}_{p}}\left( v,C \right)=\frac{\left| {{N}_{v}}\cap c \right|}{{{k}_{v}}}$，其中Nv∩C=u|v,u∈E，u∈C表示结点v在社区C中相邻点的集合，kv是结点v的度。然而结点v与社区C的关联强度不仅与关联边数有关，也和社区C中v的相邻点在C中的重要性关系密切。

 图 1 空手道俱乐部中未聚类结点分析 Fig. 1 The analysis of subordinate vertices in zacharys karate club

 $b\left( v,c \right)=\alpha \times \frac{\left| {{N}_{v}}\cap C \right|}{{{k}_{v}}}+\left( 1-\alpha \right)\times \frac{\sum\nolimits_{ui\in {{N}_{v}}\bigcap c}{B\left( {{v}_{j}} \right)}}{\sum\nolimits_{vj\in c}{B\left( {{v}_{j}} \right)}}$ (2)

 数据集 未聚类节点 Q α=0.8 α=1 α=0.8 α=1 空手道 俱乐部 1 3 0.820 5 0.717 9 海豚 社交网络 0 1 0.773 5 0.761 0 大学生 足球网络 0 0 0.639 0 0.615 0 电子 邮件网络 34 41 0.722 4 0.715 1 合作网络 657 661 0.782 8 0.647 3
3 基于稠密子图的社区发现算法

3.1 确定聚类个数

 图 2 BDSG算法在空手道俱乐部中得到的稠密子图 Fig. 2 The dense subgraphs in zacharys karate club obtained by BDSG

 M=\left\{ \begin{align} & 4200 \\ & 2400 \\ & 0054 \\ & 0045 \\ \end{align} \right\}

3.2 中心社区扩展策略

1)令集合T0=VG－∪i=1kVCi0，β0=0.7，t=0。

2)令t=t+1，对每个社区C>1(t－1),C2(t－1),…,Ck(t－1)，执行下列操作：

①Cit=Ci(t－1)(1≤i≤k)，NGi(Ct－1)={u|u∈NG(x)∧x∈Ci(t－1)}。

②对任意元素v∈T∩NGCi(t－1)1≤i≤k，如果b(v,C i(t－1))≥βt－1，则Cit=Cit∪v。

③令βtt－1－0.1，Tt=VG－∪i=1kVCit，若βt≥0.3，且Tt≠，返回步骤2)。

3) 结束，输出社区集合C=C1t,C2t,…,Ckt

4 实验与结果分析

 图 3 BDSG算法在空手道俱乐部得到的聚类结果 Fig. 3 Clustering results on zacharys karate club obtained by BDSG

 图 4 BDSG算法在海豚社交网络上得到的聚类结果 Fig. 4 Clustering results on dolphins social network obtained by BDSG

 图 5 BDSG算法在大学生足球网络上得到的聚类结果 Fig. 5 Clustering results on college football network obtained by BDSG

 数据集 顶点数 边数 原始社区个数 算法 聚类个数 未聚类节点数 Q 运行时间/ms 空手道 俱乐部 34 78 2 BDSG 2 1 0.820 5 93 CPM 3 22 0.192 3 87 k-dense 2 22 0.294 8 129 CPM+CE 3 3 0.410 2 117 k-dense+CE 2 1 0.820 5 165 海豚社交网络 62 159 2 BDSG 4 0 0.773 5 149 CPM 4 34 0.408 8 175 k-dense 4 34 0408 8 568 CPM+CE 4 16 0.591 1 202 k-dense+CE 4 16 0.591 1 599 大学生足球网络 115 613 12 BDSG 12 0 0.639 0 480 CPM 13 2 0.595 1 920 k-dense 12 2 0.637 0 1 860 CPM+CE 13 0 0.601 0 1 028 k-dense+CE 12 0 0.648 0 1 986 电子邮件网络 1 133 5 451 — BDSG 28 34 0.722 4 60 797 CPM 55 562 0.268 7 592 410 k-dense 6 558 0.251 7 55 240 CPM+CE 55 341 0.289 7 601 835 k-dense+CE 6 14 0.503 4 63 938 合作网络 1 589 2 742 — BDSG 134 657 0.782 8 21 273 CPM 159 843 0.520 1 97 161 k-dense 91 843 0.730 5 15 352 CPM+CE 159 688 0.567 5 120 927 k-dense+CE 91 790 0.763 1 23 451

5 结束语

 [1] FORTUNATO S. Community detection in graphs[J]. Physics reports, 2010, 486(3/4/5): 75-174. [2] NEPUSZ T, YU Haiyuan, PACCANARO A. Detecting overlapping protein complexes in protein-protein interaction networks[J]. Nature methods, 2012, 9(5): 471-472. [3] DEYLAMI H A, ASADPOUR M. Link prediction in social networks using hierarchical community detection[C]//Proceedings of the 7th conference on information and knowledge technology. Urmia, Iran, 2015: 1-5. [4] SCHAEFFER S E. Graph clustering[J]. Computer science review, 2007, 1(1): 27-64. [5] 杨博, 刘大有, LIU Jiming, 等. 复杂网络聚类方法[J]. 软件学报, 2009, 20(1): 54-66. [6] 冀俊忠, 刘志军, 刘红欣, 等. 蛋白质相互作用网络功能模块检测的研究综述[J]. 自动化学报, 2014, 40(4): 577-593. [7] PALLA G, BARABáSI A L, VICSEK T. Quantifying social group evolution[J]. Nature, 2007, 446(7136): 664-667. [8] SIDIROPOULOS A, PALLIS G, KATSAROS D, et al. Prefetching in content distribution networks via web communities identification and outsourcing[J]. World wide web, 2008, 11(1): 39-70. [9] 陈克寒, 韩盼盼, 吴健. 基于用户聚类的异构社交网络推荐算法[J]. 计算机学报, 2013, 36(2): 349-359. [10] FREY B J, DUECK D. Clustering by passing messages between data points[J]. Science, 2007, 315(5814): 972-976. [11] NEWMAN M E J. Community detection and graph partitioning[J]. Europhysics letters, 2013, 103(2): 28003. [12] NEWMAN M E J. Spectral methods for community detection and graph partitioning[J]. Physical review E, 2013, 88(4): 042822. [13] LIN Chuncheng, KANG Jiarong, CHEN J Y. An integer programming approach and visual analysis for detecting hierarchical community structures in social networks[J]. Information sciences, 2015, 299: 296-311. [14] REN Jun, WANG Jianxin, LI Min, et al. Identifying protein complexes based on density and modularity in protein-protein interaction network[J]. BMC systems biology, 2013, 7(S4): S12. [15] LI Xiaoli, FOO C S, NG S K. Discovering protein complexes in dense reliable neighborhoods of protein interaction networks[C]//Proceedings of the computational systems bioinformatics conference. San Diego, USA, 2007, 6: 157-168. [16] PALLA G, DERéNYI I, FARKAS I, et al. Uncovering the overlapping community structure of complex networks in nature and society[J]. Nature, 2005, 435: 814-818. [17] SAITO K, YAMADA T, KAZAMA K. Extracting communities from complex networks by the k-dense method[J]. IEICE transactions on fundamentals of electronics, communications and computer sciences, 2008, E91-A(11): 3304-3311. [18] SUN Penggang, GAO Lin. Fast algorithms for detecting overlapping functional modules in protein-protein interaction networks[C]//Proceedings of the IEEE computational intelligence in bioinformatics and computational biology. Nashville, TN, USA, 2009: 247-254. [19] LIU Guimei, WONG L, CHUA H N. Complex discovery from weighted PPI networks[J]. Bioinformatics, 2009, 25(15): 1891-1897. [20] BADER G D, HOGUE C W V. An automated method for finding molecular complexes in large protein interaction networks[J]. BMC bioinformatics, 2003, 4(1): 2. [21] FREEMAN L C. A set of measures of centrality based on betweenness[J]. Sociometry, 1977, 40(1): 35-41. [22] CUI Yaizu, WANG Xingyuan, EUSTACE J. Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks[J]. Physica A: statistical mechanics and its applications, 2014, 416: 198-207. [23] ZACHARY W W. An information flow model for conflict and fission in small groups[J]. Journal of anthropological research, 1977, 33(4): 452-473. [24] LUSSEAU D, SCHNEIDER K, BOISSEAU O J, et al. The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations: Can geographic isolation explain this unique trait[J]. Behavioral ecology and sociobiology, 2003, 54(4): 396-405. [25] NEWMAN M E J. Finding community structure in networks using the eigenvectors of matrices[J]. Physical review E, 2006, 74(3): 036104. [26] GUIMERà R, DANON L, DíAZ-GUILERA A, et al. Self-similar community structure in a network of human interactions[J]. Physical review E, 2003, 68(6): 065103. [27] NEWMAN M E J, GIRVAN M. Finding and evaluating community structure in networks[J]. Physical review E, 2004, 69(2): 026113. [28] SHEN Huawei, CHENG Xueqi, CAI Kai, et al. Detect overlapping and hierarchical community structure in networks[J]. Physica A: statistical mechanics and its applications, 2009, 388(8): 1706-1712.
DOI: 10.11992.tis.201603045

0

#### 文章信息

ZHENG Wenping, ZHANG Haojie, WANG Jie

Community detection algorithm based on dense subgraphs

CAAI Transactions on Intelligent Systems, 2016, 11(3): 426-432.
DOI: 10.11992.tis.201603045