﻿ 综合线面特征分布的点目标多尺度聚类方法
 文章快速检索 高级检索

Multi-scale Clustering of Points Synthetically Considering Lines and Polygons Distribution
YU Li, GAN Shu, YUAN Xiping, YANG Minglong
Faculty of Land Resource Engineering, Kunming University of Science and Technology, Kunming 650093, China
First author: YU Li(1987—), female, PhD candidate，majors in spatial data mining, modeling and analysis.E-mail:woshiyuli@126.com
Abstract: Considering the complexity and discontinuity of spatial data distribution, a clustering algorithm of points was proposed. To accurately identify and express the spatial correlation among points,lines and polygons, a Voronoi diagram that is generated by all spatial features is introduced. According to the distribution characteristics of point's position, an area threshold used to control clustering granularity was calculated. Meanwhile, judging scale convergence by constant area threshold, the algorithm classifies spatial features based on multi-scale, with an O(n log n) running time.Results indicate that spatial scale converges self-adaptively according with distribution of points.Without the custom parameters, the algorithm capable to discover arbitrary shape clusters which be bound by lines and polygons, and is robust for outliers.
Key words: spatial clustering     multi-scale     Voronoi diagram of all features     constraints

1 引 言

2 Voronoi剖分下的点目标聚散特征表达

2.1 邻近关系

 图 1 空间目标分布的邻近程度测算 Fig. 1 Calculate adjacency of spatial objects distribution

2.2 势力范围

 图 2 Voronoi图表征的生长源分布特征 Fig. 1 Distribution characteristics of generators are expressed by Voronoi diagram

3 单一尺度的点目标聚类

MSSC算法中，σ值的选取决定了聚类的尺度，记:SmaxSmin分别是集合P的Voronoi多边形面积的最大值和最小值，则σ的取值情况为：当σ≤Smin，所有点目标各自成类；当σ≥Smax，所有点目标合为一类；当Smin<σ<Smax，点目标随σ值的变化实现多尺度的聚类，σ值越小，聚类粒度越细，类内归属信息越精确，但类间相似性越强；σ值越大，聚类粒度越粗，类间差异性越强，类内的相似性越低。

(1) 初始化类编号clusterNum←1，isBoundaryFirst←True表示类的边界点首次被遍历，新建临时栈pStack←Null。

(2) 遍历VoronoiLayer图层中的Voronoi多边形，以k次遍历为例，若vk._geoLyrType=Point，说明该Voronoi区域是点目标生成，将其压入栈中，pStack←vk

(3) 当栈非空，pStack=┐，出栈vj，若vj._isCluster=False，则：

1) 若vj._geoLyrType=Point &&vj._geoArea≤σ,根据定义1，找到vj的自然邻居集合，并将该集合中_isCluster属性为False的Voronoi多边形压入栈pStack，vj._clusterID←clusterNum，vj._isCluster←True，isBoundaryFirst←False；

2) 若vj._geoLyrType≠Point &&vj._geoArea≤σ,根据定义1，找到vj的自然邻居集合，并将该集合中_isCluster属性为False的Voronoi多边形压入栈pStack，vj._isCluster←True，isBoundaryFirst←False；

3) 若vj._geoLyrType=Point &&vj._geoArea>σ&&isBoundaryFirst=False，则vj._clusterID←clusterNum，vj._isCluster←True；

4) 若以上条件皆不满足，Continue。

(4) clusterNum++，转至[2]，第k次遍历结束。

(5) 直至遍历VoronoiLayer图层中所有对象，算法结束。}

ClustersLayer图层_clusterID字段值表示聚类的结果，同一_clusterID值的Voronoi多边形为同类，代表其生长源亦属同类。

4 多尺度点目标聚类

4.1 多尺度下的Voronoi聚类

 图 3 不同尺度下点目标粒度的Voronoi划分 Fig. 3 Points granularity is divided by Voronoi in different scales
4.2 面积阈值σ的计算

4.3 多尺度聚类的收敛条件

MSSC算法以面积阈值σ控制聚类的粒度，以σ值的变化趋势判断空间多尺度的收敛条件。主要表现为：σ的计算取决于类内点目标生成的Voronoi图的平均面积，即σ=PMA。根据点目标的Voronoi分布特征，当类内点目标非均匀分布时，聚集程度高的点目标生成的Voronoi图面积小于σ，即将该子目标划分为新的子类，在下一尺度计算新子类的PMA，继续聚类；当类内点目标均匀分布，无聚集趋势时，每个点目标生成的Voronoi图面积均大于或等于σ，保持原有分类，下一尺度计算该类的PMA时，值保持不变。由此可以判断：两个相邻尺度下σ值保持不变时，聚类停止，算法尺度收敛。

(1) 初聚类时，λ1=1，σ1为点目标P的全局平均面积GMA(P)，取唯一值，根据算法1进行聚类，得到子类集合SC[1][]={SC[1][1],…,SC[1][k1]}；

(2) 再聚类时，λ2=2，先计算子类SC[1][]集合中每个子类的局部平均面积，即σ2[]={PMA(SC[1][1]),…,PMA(SC[1][k1])}，若σ1σ2[]，根据算法1分别进行SC[1][]子类集合中各子类的再聚类，得到子类集合SC[2][]={SC[2][1],…,SC[2][k2]}；

(3) 继续再聚类至m尺度，λm=m，计算子类SC[m-1][]集合中每个子类的局部平均面积，即σm[]={PMA(SC[m-1][1]),…,PMA(SC[m-1][km-1])}，若σm[]≠σm-1[]，根据算法1分别进行SC[m-1][]子类集合中各子类的再聚类，得到子类集合SC[m][]={SC[m][1],…,SC[m][km]}，重复过程(3)；若σm[]=σm-1[]，算法结束，λm-1为点集P聚类的最小尺度。

(1) 初始化变量isFirstClustering←True，表示为初聚类。

1) 若isFirstClustering=True，σ值为所有点目标的全局平均面积，σ1=GMA(P)，执行算法1，isFirstClustering←False。

2) 若isFirstClustering=False，以尺度λm为例，σm[]为λm-1尺度下得到的各个子类局部平均面积的集合，若σm[]≠σm-1[]，对每个子类执行算法1；若σm[]=σm-1[]，说明σ已收敛，算法结束，点集P聚类的最小尺度为m-1。}

4.4 MSSC算法的时间复杂度

5 试验与讨论

5.1 MSSC算法的可行性

 图 4 研究区空间数据 Fig. 4 Spatial data of research region

 图 5 全要素Voronoi图 Fig. 5 Voronoi diagram with all features
 图 6 综合线面特征聚类 Fig. 6 Clustering with lines and polygons

 图 7 面积阈值与空间尺度的关系 Fig. 7 The relationship between area threshold and spatial scale

 图 8 不同空间尺度下的MSSC算法聚类结果 Fig. 8 Clustering result of MSSC algorithm in different spatial scales
5.2 MSSC算法的稳定性

 图 9 异常值的识别 Fig. 9 Distinguish outliers from points
6 结 论

MSSC算法利用全要素Voronoi剖分准确地识别了空间点线面目标的邻近关系，对河流、湖泊、建筑物等具有阻碍点目标空间连续性的障碍物的处理符合人类的空间认知习惯；根据点目标分布特征自适应地计算σ值，以其数值控制聚类的粒度，以其变化规律控制尺度的收敛，能够发现空间点目标的典型聚集模式；聚类过程中对异常值处理稳健，无须自定义参数。

MSSC算法适用于发现客观世界中表征地形、气候、经济、人口等位置分布的离散数据点聚集模式，进而解译隐含规律。结合试验对高程点的聚散特征分析，被聚类的高程点表征了地形起伏较大的区域，尺度收敛的速率亦体现了空间数据分布异质性的强弱，相关问题亟待进行下一步实证分析研究。

 [1] NG R T, HAN J W. Efficient and Effective Clustering Methods for Spatial Data Mining[C]//Proceedings of the 20th VLDB Conference. Santiago, Chile:[s.n.], 1994: 144-155. [2] ESTIVILL-CASTROV, LEEI. AUTOCLUST+: Automatic Clustering of Point-data Sets in the Presence of Obstacles[C]//Proceedings of the 1st International Workshop on Temporal, Spatial and Spatio-temporal Data Mining. Berlin: Springer-Verlag, 2001: 133-146. [3] LEUNG Y, ZHANGJ S, XUZ B. Clustering by Scale-space Filtering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(12): 1396-1410. [4] TUNG A K H, HOU J, Han J W. Spatial Clustering in the Presence of Obstacles[C]//Proceedings of the 17th International Conference on Data Engineering. Heidelberg: IEEE, 2001: 359-367. [5] ZAIANE O R, LEE C H. Clustering Spatial Data when Facing Physical Constraints[C]//Proceedings of the 2002 IEEE International Conference on Data Mining. Maebashi City, Japan:IEEE, 2002: 737-740. [6] FAN B. A Hybrid Spatial Data Clustering Method for Site Selection: The Data Driven Approach of GIS Mining[J]. Expert Systems with Applications, 2009, 36(2): 3923-3936. [7] YANG Yang, SUN Zhiwei, ZHAO Zheng. Density-based Spatial Clustering Method with Obstacle Constraints[J]. Computer Applications, 2007, 27(7): 1688-1691. (杨杨, 孙志伟, 赵政. 一种处理障碍约束的基于密度的空间聚类算法[J]. 计算机应用, 2007, 27(7): 1688-1691.) [8] CAO Keyan, WANG Guoren, HAN Donghong, et al. Clustering Algorithm of Uncertain Data in Obstacle Space[J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(12): 1087-1097. (曹科研, 王国仁, 韩东红, 等. 障碍空间中不确定数据聚类算法[J]. 计算机科学与探索, 2012, 6(12): 1087-1097.) [9] WANG Min, ZHOU Chenghu, PEI Tao, et al. MSCMO: A Scale Space Clustering Algorithm Based on Mathematical Morphology Operators[J]. Journal of Remote Sensing, 2004, 8(1): 45-50.(汪闽, 周成虎, 裴韬, 等. MSCMO: 基于数学形态学算子的尺度空间聚类方法[J]. 遥感学报, 2004, 8(1): 45-50.) [10] KONG Juan, XU Futian, XUE Qingfeng. Clustering Algorithm Based on Mathematical Morphology in the Presence of Obstacles[J]. Computer Knowledge and Technology, 2008, 4(9): 2923-2925. (孔娟, 徐夫田, 薛庆峰. 基于数学形态学的带障碍约束的空间聚类算法研究[J]. 电脑知识与技术, 2008, 4(9): 2923-2925.) [11] ESTIVILL-CASTROV, LEEI. Clustering with Obstacles for Geographical Data Mining[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2004, 59(1-2): 21-34. [12] SHI Yan, LIU Qiliang, DENG Min, et al. A Novel Spatial Clustering Method with Spatial Obstacles[J]. Geomatics and Information Science of Wuhan University, 2012, 37(1): 96-100. (石岩, 刘启亮, 邓敏, 等. 一种顾及障碍约束的空间聚类方法[J]. 武汉大学学报: 信息科学版, 2012, 37(1): 96-100.) [13] LIUD Q, NOSOVSKIYG V, SOURINAO. Effective Clustering and Boundary Detection Algorithm Based on Delaunay Triangulation[J]. Pattern Recognition Letters, 2008, 29(9): 1261-1273. [14] EDLA D R, JANA P K. A Novel Clustering Algorithm Using Voronoi Diagram[C]//Proceedings of the 7th International Conference on Digital Information Management. Macau: IEEE, 2012: 35-40. [15] XUE Lixia, WANG Linlin, WANG Zuocheng, et al. Spatial Clustering in the Presence of Obstacles Based on Voronoi Diagram[J]. Computer Science, 2007, 34(2): 189-191. (薛丽霞, 汪林林, 王佐成, 等. 基于Voronoi图的有障碍物空间聚类[J]. 计算机科学, 2007, 34(2): 189-191.) [16] LUO Jiancheng, YEE Leung, ZHOU Chenghu. Scale Space Based Hierarchical Clustering Method and Its Application to Remotely Sensed Data Classification[J]. Acta Geodaetica et Cartographica Sinica, 1999, 28(4): 319-324. (骆剑承, 梁怡, 周成虎. 基于尺度空间的分层聚类方法及其在遥感影像分类中的应用[J]. 测绘学报, 1999, 28(4): 319-324.) [17] CHEN Xiaoyu. A Study of Economic Regionalization Based on Multiscale Spatial Clustering[J]. Journal of Chongqing Normal University:Natural Science, 2011, 28(5): 81-84, 92. (陈小瑜. 基于多尺度空间聚类的经济区域划分研究[J]. 重庆师范大学学报: 自然科学版, 2011, 28(5): 81-84, 92.) [18] ESTIVILL-CASTROV, LEEI. Amoeba: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram[C]//Proceedings of the 9thInternational Symposium on Spatial Data Handling. [S.l.]: IGU Study Group on Geographical Information Science, 2000: 1-16. [19] SHI Peibei, GUO Yutang, HU Yujuan, et al.Multiscale Spectral Clustering Algorithm[J]. Computer Engineering and Applications, 2011, 47(8): 128-130. (施培蓓, 郭玉堂, 胡玉娟, 等. 多尺度的谱聚类算法[J]. 计算机工程与应用, 2011, 47(8): 128-130.) [20] FU Lihua, LI Hongwei, ZHANG Meng, et al. Multi-scale Radial Basis Function Networks with Multiple Kernels[J]. Journal of Huazhong University of Science & Technology:Nature Science Edition, 2010, 38(1): 39-42. (付丽华, 李宏伟, 张猛, 等. 带多个核函数的多尺度径向基函数网络[J]. 华中科技大学学报: 自然科学版, 2010, 38(1): 39-42.) [21] CHEN Jun. Voronoi Dynamic Spatial Data Model[M]. Beijing: Surveying and Mapping Press, 2002. (陈军. Voronoi动态空间数据模型[M]. 北京: 测绘出版社, 2002.) [22] GOLD C M. Dynamic Spatial Data Structures:The Voronoi Approach[C]//Proceedings of the 1992 Canadian Conference on GIS.Ottawa:CIG, 1992: 245-255. [23] MACQUEEN J. Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkley:[s.n.],1967: 281-297. [24] HAN J W, KAMBER M. Data Mining: Concepts and Techniques[M]. Beijing: China Machine Press, 2001. (HAN J W, KAMBER M. 数据挖掘: 概念与技术[M]. 北京: 机械工业出版社, 2001.) [25] ZHANGT, RAMAKRISHNAN R, LIVNY M. BIRCH: An Efficient Data Clustering Method for Very Large Databases[C]//Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data.New York: ACM, 1996: 103-114."
http://dx.doi.org/10.11947/j.AGCS.2015.20150136

0

#### 文章信息

YU Li, GAN Shu, YUAN Xiping, YANG Minglong

Multi-scale Clustering of Points Synthetically Considering Lines and Polygons Distribution

Acta Geodaeticaet Cartographica Sinica, 2015, 44(10): 1152-1159.
http://dx.doi.org/10.11947/j.AGCS.2015.20150136