遥感大数据分布式技术研究与实现

引用本文

罗敬宁, 刘立葳. 遥感大数据分布式技术研究与实现[J]. 应用气象学报, 2017, 28(5): 621-631. 复制到剪切板

Luo Jingning, Liu Liwei. Research and Implementation of Remote Sensing Big Data Distributed Technology[J]. Journal of Applied Meteorological Science, 2017, 28(5): 621-631 复制到剪切板

遥感大数据分布式技术研究与实现

罗敬宁, 刘立葳

国家卫星气象中心, 北京 100081

2017-04-06 收到, 2017-07-27 收到修改稿.

资助项目: 公益性行业（气象）科研专项（GYHY201306068）

通讯作者: 罗敬宁, email:luojn@cma.gov.cn.

摘要: 面向卫星遥感海量数据，针对其数据量的急速增长，对数据分析、价值挖掘提出了全新的挑战，引入驱动大数据应用的分布式模式，建立了适应卫星遥感大数据的网格模型，打破了数据的时空割裂和限制，数据可以作为整体进行存储、计算和应用，模型设计的网格、时间片、物理层的基本结构，可以保证未来云计算的实施。该文提出了基于希尔伯特曲线的网格散列算法，以此建立的分布式系统具有优异的并行读写性能和良好的负载均衡能力；遥感大数据分布式系统，实现了数据的高速分布式并行读写，支持数据的精确时空匹配和动态获取，整个系统的扩展能力可以达到线性增长，系统基于通用软硬件平台实施，实现卫星遥感大数据灵活、按需和简便的应用。

关键词: 卫星遥感空间网格大数据希尔伯特曲线分布式系统

Research and Implementation of Remote Sensing Big Data Distributed Technology

Luo Jingning, Liu Liwei

National Satellite Meteorological Center, Beijing 100081

Abstract: In the past ten years, the global various digital information grows explosively, and a big data era with massive data production, sharing and application is opened. In this decade, with the development of information technology, distributed storage and computing technology get great development to deal with the explosive growth of information, and the knowledge system and technical reserves are established gradually. In China, research on big data and distributed computing is being carried out widely. For satellite remote sensing data of large volume and rapid growth, the traditional archive-callback-application cannot meet demands of data analysis and data mining in the era of big data. The traditional file-based way has many limitations, especially when used for cloud computing and in-telligent services, and it is very difficult to use. The big data grid model and distributed model is the key to solve the bottleneck, enabling real-time computing and on-demand services, and therefore it has important reference significance. It overcomes the temporal and spatial fragmentation problem, making the remote sensing data possible to be stored, calculated and applied as a whole. Based on the Hilbert curve grid hash algorithm, a distributed system containing fundamental structure of grid, time slice and physical layer is established, demonstrating excellent parallel read-write performance. Hilbert hash algorithm has stable discrete degree, which is the key for the grid model to maintain spatial correlation and to map two-dimensional space to one-dimensional sequence. Using the distributed system, instead of traditional way of data file organization and management, properties flexible and intuitive data acquisition are realized. Users can truly experience a new way of what you see is what you get and what you get is what you need to get. The future system which is based on the data model, will greatly increase the work efficiency, make the focus from the data itself to data applications. Internet-based cloud computing grid cell calculation can be realized, and the extension ability of the whole system can achieve linear growth, based on the general hardware and software platform. The implementation of this system will greatly improve the work efficiency, completing high-speed parallel data reading and writing, making on-demand data application more smoothly.

Key words: satellite remote sensing spatial grid big data Hilbert curve distributed system

[1]	Viktor M S, Kenneth C. Big Data:A Revolution That Will Transform How We Live.Work and Think. Houghton Mifflin Harcourt Publishing Company, 2013: 9–23.
[2]	周峥嵘, 王琤, 何文春. 分布式气象元数据同步系统的探索研究. 应用气象学报, 2010, 21, (1): 121–128. DOI:10.11898/1001-7313.20100117
[3]	何文春, 高峰, 许艳, 等. 气候监测业务专题数据服务系统的设计与实现. 应用气象学报, 2012, 23, (5): 624–630. DOI:10.11898/1001-7313.20120514
[4]	Ian F, Yong Z, Ioan R. Cloud Computing and Grid Computing 360-Degree Compared. Grid Computing Environments Workshop, 2008, 2008: 1–10.
[5]	Michael A, Armando F, Rean G. Above the clouds:A berkeley view of cloud computing. Communications of the ACM, 2010, 53, (4): 50–58. DOI:10.1145/1721654
[6]	Sanjay G, Howard G, Leung S T. The Google File System. SOSP 03 Proceedings of the Nineteenth ACM symposium on Operating Systems Principles, 2003: 29–43.
[7]	Fay C, Jeffrey D, Sanjay G. Bigtable:A distributed storage system for structured data. ACM Transactions on Computer Systems, 2008, 26, (2): 205–218.
[8]	Jeffrey D, Sanjay G. MapReduce:Simplied data processing on large clusters. Communications of the ACM-50th Anniversary Issue(1958-2008), 2008, 51, (1): 107–113.
[9]	覃雄派, 王会举, 杜小勇, 等. 大数据分析——RDBMS与MapReduce的竞争与共生. 软件学报, 2010, 23, (1): 32–45.
[10]	陈康, 郑纬民. 云计算:系统实例与研究现状. 软件学报, 2009, 20, (5): 1337–1348.
[11]	王意洁, 孙伟东, 周松, 等. 云计算环境下的分布存储关键技术. 软件学报, 2012, 23, (4): 962–986.
[12]	孙勇, 林菲, 王宝军. 面向云计算的键值型分布式存储系统研究. 电子学报, 2013, 41, (7): 1406–1411.
[13]	吴吉义, 傅建庆, 平玲娣, 等. 一种对等结构的云存储系统研究. 电子学报, 2011, 38, (5): 1100–1107.
[14]	李永生, 曾沁, 徐美红, 等. 基于Hadoop的数值预报产品服务平台设计与实现. 应用气象学报, 2015, 26, (1): 122–128.
[15]	高峰, 王国复, 孙超, 等. 后台管理模式在数据共享平台中的应用. 应用气象学报, 2011, 22, (3): 367–374. DOI:10.11898/1001-7313.20110314
[16]	金之雁, 王鼎兴. 一种在异构系统中实现负载平衡的方法. 应用气象学报, 2003, 14, (4): 410–418.
[17]	陈钻, 李海胜. 新型台风海洋网络气象信息系统的设计与实现. 应用气象学报, 2012, 23, (2): 245–250. DOI:10.11898/1001-7313.20120214
[18]	赵立成, 王素娟, 施进明. 国家卫星气象中心信息共享体制研究与技术实现. 应用气象学报, 2002, 13, (5): 627–632.
[19]	钱建梅, 郑旭东. 国家卫星气象中心气象卫星资料存档系统. 应用气象学报, 2003, 14, (6): 756–762.
[20]	李德仁. 论广义空间信息网格和狭义空间信息网格. 遥感学报, 2005, 9, (5): 513–519.
[21]	Reinhold B. Linear Algebra and Projective Geometry. New York: Academic Press, 1952: 39–70.
[22]	David H. Ueber die Reellen Züge Algebraischer Curven. Mathematische Annalen, 1891, 38, (1): 115–138. DOI:10.1007/BF01212696
[23]	Duvall P, Keesling J, Vince A. The Hausdorff dimension of the boundary of a self-similar tile. Journal of the London Mathematical Society, 2000, 61, (3): 748–760. DOI:10.1112/jlms.2000.61.issue-3
[24]	Mandelbrot B B. How long is the coast of Britain? Statistical self-similarity and fractional dimension. Science, 1967, 156: 636–638. DOI:10.1126/science.156.3775.636
[25]	刘润涛, 陈琳琳, 田广悦. Z曲线网格划分的最近邻查询. 计算机工程与应用, 2013, 49, (22): 123–126. DOI:10.3778/j.issn.1002-8331.1208-0249


图1 大数据网格模型基本结构 Fig.1 The basic structure of big data grid model


图2 希尔伯特曲线分形^[22] Fig.2 Hilbert curve fractal(from reference [22])


图3 3种子空间离散度分析 (a)行序曲线散列，(b)Z曲线散列，(c)希尔伯特曲线散列 Fig.3 Dispersion analysis of three subspace (a)row order curve, (b)Z-order curve, (c)Hilbert curve


图4 卫星遥感分布式数据系统结构 Fig.4 Satellite remote sensing distributed data system architecture


图5 网格写入的数据和控制流程 Fig.5 Data and control flow for grid written


图6 不同数据量网格读取性能测试 Fig.6 Different amount of data read performance test


图7 节点扩展读取性能测试 Fig.7 Read performance test of node


图8 基于空间关联的数据检索与处理系统交互界面 Fig.8 Interactive interface of data retrieval and processing system based on spatial correlation


图9 内蒙古卫星遥感数据库和数据服务系统交互界面 Fig.9 Interactive interface of satellite remote sensing database and data service system in Inner Mongolia