﻿ 模式匹配不确定性的多因素集结度量
 文章快速检索 高级检索

Uncertain measure for schema matching based on the aggregation of uncertain factors
HU Wenbin , PAN Zhushan, JI Zhaohui
School of Computer Engineering, Huaihai Institute of Technology, Lianyungang 222005, China
Abstract: To measure efficiently uncertainty of schema matching, a measure model based on all uncertain factors was proposed and an aggregation operator was given according to the relations of uncertain factors. A measure method of semantic matching and attribute matching based on all known entropy uncertain ratio was designed. A measure algorithm of process uncertainty was introduced to measure uncertainty of a decision making process. The aggregation operator based on relationships between uncertain factors was proposed to determine influence degree of uncertain factors and merge all measure values in the measure process. The real world examples illustrate that the proposed model and methods can completely reflect three factors of uncertainty and can measure efficiently uncertainty for schema matching. The proposed methods are efficient and scalable.
Key words: schema definition     schema analysis     schema matching     uncertainty analysis     measured data uncertainty     measurement method     decision analysis     entropy     aggregation estimation method

1 模式匹配中的不确定性

2 模式匹配的不确定性度量 2.1 不确定性度量模型

 图 1 模式匹配不确定性的度量模型Fig. 1 Uncertainty measure model for schema matching

2.2 模式对象清洗

2.3 基于全知熵的不确定性度量

2.3.1 模式匹配的全知熵不确定率

1)0≤(H(C)+H(D|C)- H(D)) / (log(|O|) - H(D))≤1，因此0≤μall≤1非负；

2)若R1R2，则H(C1)= H(C2)，H(D1)= H(D2)，所以μall满足不变性；

3) 若R1<R2，则根据文献[22]的定理7有H(C1) <H(C2),H(D1)<H(D2)，所以μall满足单调性。

2.3.2 语义匹配的不确定性度量

2.3.3 属性匹配的不确定性度量

2.4 匹配决策过程的不确定性度量

 图 2 模式匹配的过程模型Fig. 2 Process model of schema matching

ti为任务块，B1为SOC的执行过程，B2为语义匹配和属性匹配的顺序执行过程，B3为属性匹配的执行过程，B4为匹配结果合并过程。图 2可转换为图 3形式。

 图 3 转换后的过程模型Fig. 3 Converted process model

3 不确定性因素的集结度量 3.1 不确定性因素影响程度的判断

1)增长性，若e2是一个积极因素，即z2>0。e1e2对SM不确定性的集结影响z应该满足z> z1；若e1也是一个积极因素，则z>z2也同样成立。

2)不变性，若e2是一个不变因素，即z2=0。e1e2对SM不确定性的集结影响z应该满足z=z1，即SM的不确定性受e1的影响大；若e1也是一个不变因素，则SM的不确定性不受加入因素的影响。

3)减弱性，若e2是一个消极因素，即z2<0。e1e2对SM不确定性的集结影响z应该满足z<z1；若e1也是一个消极因素，则z<z2也同样成立。一个积极因素和一个消极因素对SM不确定性的集结影响取决于绝对值大的因素，且集结影响值小于较大的绝对值。

4)有界性，z1z2∈[-1, 1]，以确保多个因素的集结影响可以通过两两集结来实现。

5)交换率，z1z2= z2z1，这可以保证2个给定不确定性因素对SM不确定性的影响保持不变。

6)结合率，(z1z2)⊕z3= z1⊕(z2z3)，这表明2个以上因素的集结影响与各因素参与计算的次序无关。

3.2 总不确定率

4 实验与分析 4.1 实验

 图 4 存储在校生数据的模式截图Fig. 4 Schema data of undergraduate
 图 5 存储毕业生数据的模式截图Fig. 5 Schema data of graduate

 序号 年份 对象个数 1 2005sp 17 2 2006au 17 3 2007au 7 4 2008sp 9 5 2009sp 7 注：表中数据为2005年春至2009年春江苏省VFP二级考试的数据情况

 No. 模式个数|S| 模式对象个数|U| 条件属性 决策属性 |C1| VC1 |D1| VD1 1 2 2 6 {0,1,2} 1 0,1,2} 2 5 52 6 {0,1,2} 1 {0,1,2} 注：{0,1,2}中，0—是，1—不是，2—不确定

 匹配类型 方案1 方案2 |S| |U| |S| |U| ANM 2 12 5 301 ATM 2 16 5 301 KRM 2 2 5 5 DIM 2 300 5 3501

 第1种方案 第2种方案 USM μ1=0 μ1=0.34 UAM μ2=0.21 μ2=0.95 DP μ3=0.17 μ3=0.17 总计 μwhole=0.19 μwhole=0.72
4.2 分析

5 结束语

 [1] SHVAIKO P, EUZENAT J. A survey of schema-based matching approaches[J]. Journal on Data Semantics IV, 2005(3730): 146-171. [2] MAGNANI M, RIZOPOULOS N, BRIEN P, et al. Schema integration based on uncertain semantic mappings[J]. Lecture Notes in Computer Science, 2005(3716): 31-46. [3] HALEVY A, RAJARAMAN A, ORDILLE J. Data integration:the teenage years[Z]. Seoul, 2006: 9-16. [4] 翁年凤,刁兴春,曹建军,等. 不确定模式匹配研究综述[J]. 计算机科学, 2011, 38(12): 1-5.WENG Nianfeng, DIAO Xingchun, CHAO Jianjun, et al. Survey of uncertain schema matching[J]. Computer Science, 2011, 38(12): 1-5. [5] 姜芳艽,孟小峰,贾琳琳. Deep Web 集成服务的不确定模式匹配[J]. 计算机学报, 2008, 31(8): 1412-1421.JIANG Fangjiao, MENG Xiaofeng, JIA Linlin. Uncertain schema matching in deep web integration service[J]. Chinese Journal of Computers, 2008, 31(8): 1412-1421. [6] MAGNANI M, MONTESI D. Probabilistic data integration[R]. Bologna(italy): UBLCS, 2009. [7] DONG X L, HALEVY A, YU CONG. Data integration with uncertainty[J]. The VLDB Journal, 2009, 18: 469-500. [8] AVIGDOR G. Managing uncertainty in schema matching with top-k schema mappings[J]. Journal on Data Semantics, 2006, 6: 90-114. [9] LIU Baoding. Uncertainty theory[M]. Berlin: Springer-Verlag, 2007: 3-12. [10] LIU Baoding. Some research problems in uncertainty theory[J]. Journal of Uncertain Systems, 2009, 3(1): 3-10. [11] 王永利,钱江波,孙淑荣. AMUR:一种RFID数据不确定性的自适应度量算法[J]. 电子学报, 2011, 39(3): 579-584.WANG Yongli, QIAN Jiangbo, SUN Shurong. AMUR: an adaptive measuring algorithm of underlying uncertainty for rfid data[J]. Chinese of Journal Electronics, 2011, 39(3): 579-584. [12] PAWLAL Z. Rough sets[J]. International Journal of Computer and Information Science, 1982, 11(5): 341-356. [13] QIU Taorong, YOU Min, GE Hanjuan, et al. A method of uncertainty measure based on rough set[Z]. 2008: 544-547. [14] JIANG Feng, SUI Yuefei, CAO Cungen. An information entropy-based approach to outlier detection in rough sets[J]. Expert Systems with Applications, 2010, 37(9): 6338-6344. [15] LIANG Jiye, WANG Junhong, QIAN Yuhua. A new measure of uncertainty based on knowledge granulation for rough sets[J]. Information Sciences, 2009, 179(4): 458-470. [16] IFTIKHAR-U S, ARYYA G. Managing uncertainty in location services using rough set and evidence theory[J]. Expert System with Application, 2007, 32(2): 386-396. [17] 胡文彬,李千目,张宏. 基于领域知识的不确定性关系模式集成[J]. 南京理工大学学报:自然科学版, 2010, 34(4): 409-414.HU Wenbin, LI Qianmu, ZHANG Hong. Uncertain relation schema integration based on domain knowledge[J]. Journal of Nanjing University of Science and Technology: Natural Science, 2010, 34(4): 409-414. [18] 胡文彬,张宏,李千目. 基于全知熵的模式集成不确定性度量模型[J]. 南京航空航天大学学报, 2012, 44(4): 575-579.HU Wenbin, ZHANG Hong, LI Qianmu. Uncertainty measure model of schema integration based on all known entropy[J]. Journal of Nanjing University of Aeronautics & Astronautics, 2012, 44(4): 575-579. [19] WANG J G, MENG G Y, ZHENG X L. The attribute reduce based on rough sets and sat algorithm[Z]. 2008: 98-102. [20] LIANG Jiye, QIAN Yuhua. Information granules and entropy theory in information systems[J]. Science in China Series F: Information Sciences, 2008, 51(10): 1427-1444. [21] 赵军,周应华. 基于粗集理论的系统不确定性度量方式研究[J]. 小型微型计算机系统, 2010, 31(2): 354-359.ZHAO Jun, ZHOU Yinghua. Study on system uncertainty measures based on rough set theory[J]. Journalof Chinese Computer Systems, 2010, 31(2): 354-359. [22] YU Daren, HU Qinghua, WU Congxin. Uncertainty measures for fuzzy relations and their applications[J]. Applied Soft Computing, 2007, 7(3): 1135-1143. [23] 胡军,王国胤. 粗糙集的不确定性度量准则[J]. 模式识别与人工智能, 2010, 23(5): 606-615.HU Jun, WANG Guoyin. Uncertainty measure rule sets of rough sets[J]. Pattern Recognition and Artificial Intelligence. 2010, 23(5): 606-615. [24] MAGNANI M, MONTESI D. Uncertainty in data integration:current approaches and open problems[M]. Enschede, The Netherlands: the Centre for Telematics and Information Technology, 2007: 26-32. [25] JUNG J Y, CHIN C H, CARDOSO J. An entropy-based uncertainty measure of process models[J]. Information Processing Letters, 2011, 111(3): 135-141. [26] 岳昆,刘惟一,王晓玲. 一种基于不确定性因素叠加的Web服务质量度量方法[J]. 计算机研究与发展, 2009, 46(5): 841-849.YUE Kun, LIU Weiyi, WANG Xiaoling. An approach for measuring quality of web services based on the superposition of uncertain factors[J].Journal of Computer Research and Development, 2009, 46(5): 841-849.
DOI: 10.3969/j.issn.1673-4785.201405061

0

#### 文章信息

HU Wenbin, PAN Zhushan, JI Zhaohui

Uncertain measure for schema matching based on the aggregation of uncertain factors

CAAI Transactions on Intelligent Systems, 2015, 10(02): 286-292.
DOI: 10.3969/j.issn.1673-4785.201405061