﻿ 连续型数据的辨识矩阵属性约简方法
«上一篇
 文章快速检索 高级检索

 智能系统学报  2017, Vol. 12 Issue (3): 371-376  DOI: 10.11992/tis.201704032 0

### 引用本文

FENG Dan, HUANG Yang, SHI Yunpeng, et al. A discernibility matrix-based attribute reduction for continuous data[J]. CAAI Transactions on Intelligent Systems, 2017, 12(3): 371-376. DOI: 10.11992/tis.201704032.

### 文章历史

1. 国网葫芦岛供电公司 信息通信分公司, 辽宁 葫芦岛 125000;
2. 渤海大学 数理学院, 辽宁 锦州 121000

A discernibility matrix-based attribute reduction for continuous data
FENG Dan1,2, HUANG Yang2, SHI Yunpeng2, Wang Changzhong2
1. Information and Communication Branch, State Grid Power Supply Company of Huludao, Huludao 125000, China;
2. College of Mathematics and Physics, Bohai University, Jinzhou 121000, China
Abstract: In data processing, attribute reduction is an important application of rough set theory. The existing methods for continuous data mainly concentrate on the greedy algorithms based on the positive region. These methods take account of only the identifiability between consistent samples and other samples while ignoring distinguishability among the boundary samples. To overcome the disadvantage based on the positive domain algorithm, this paper proposed a new method for attribute reduction using a discernibility matrix. The model considers not only the consistency of samples in the positive region but also the reparability of boundary samples. On this basis, this paper analyzes the structure of attribute reduction and defines a discernibility matrix to characterize the discernibility ability of a subset of attributes. Next, an attribute reduction algorithm was designed based on the discernibility matrix. The validity of the proposed algorithm was verified using UCI standard data sets and theoretical analysis.
Key words: neighborhood relation    rough set    attribute reduction    discernibility matrix    heuristic algorithm

1 邻域关系决策表的属性约简

 ${{M}_{B}}=\{({{x}_{i}}, {{x}_{j}})\in U\times U:\text{ }\left| {{f}_{a}}({{x}_{i}})-{{f}_{a}}({{x}_{j}}) \right|\le \varepsilon, \forall a\in B\}$

 ${{\delta }_{B}}({{x}_{i}})=\{{{x}_{j}}\in U:({{x}_{i}}, {{x}_{j}})\in {{M}_{B}}\}$

 $\begin{matrix} {{M}_{D}}=\left\{ ({{x}_{i}}, {{x}_{j}})\in U\times U\left| d({{x}_{i}})=d({{x}_{j}}) \right. \right\} \\ {{[{{x}_{i}}]}_{D}}=\{{{x}_{j}}\in U\left| d({{x}_{i}})=d({{x}_{j}}) \right.\} \\ \end{matrix}$

aiBA，如果POSB(D)=POSB-{ai}(D)，则称ai相对于DB中不必要的属性；否则，称aiB中必要的属性。如果POSB(D)=POSA(D)且B中每一个属性相对于D都是B中必要的，则称BA的一个属性约简。

1)xi∈POSA(D), xj∉POSA(D)∧d(xi)≠d(xj);

2)xi, xj∈POSA(D)∧[xi]D∩[xj]D=Ø；则有xjδA(xi)⇒xjδB(xi)。

 $\begin{matrix} \text{DIS}=\{({{x}_{i}}, {{x}_{j}})\left| {{x}_{i}} \right.\in \text{PO}{{\text{S}}_{A}}\left( D \right), {{x}_{j}}\notin ~\text{PO}{{\text{S}}_{A}}\left( D \right)\wedge \\ d({{x}_{i}})\ne d({{x}_{j}})\}\cup \{({{x}_{i}}, {{x}_{j}})\text{ }\left| {{x}_{i}}, {{x}_{j}} \right.\in \\ \text{PO}{{\text{S}}_{A}}\left( D \right)\wedge \text{ }{{[{{x}_{i}}]}_{D}}\cap {{[{{x}_{j}}]}_{D}}=\varnothing \} \\ \end{matrix}$

 \boldsymbol{\text{DM}}\left( i,\text{ }j \right)=\left\{ \begin{align} & \{{{a}_{l}}\in A:{{x}_{j}}\notin {{\delta }_{{{a}_{l}}}}({{x}_{i}})\},\text{ }({{x}_{i}},{{x}_{j}})\in \text{DIS} \\ & A,\text{ }({{x}_{i}},{{x}_{j}})\notin \text{DIS} \\ \end{align} \right.

 $f\left( U, A, F, D \right)=\underset{i, j=1}{\overset{n}{\mathop{\wedge }}}\, \left( \vee \boldsymbol{\text{DM}}\left( i, j \right) \right)$

 $f\left( U, A, F, D \right)=\underset{k=1}{\overset{L}{\mathop{\vee }}}\, (\wedge {{B}_{k}})$

A的所有约简组成的集类记为REDD(A)={Bk:kl}。

ε=0.25，根据定义1和定义2，计算关系矩阵Nl(i≤4)、NA以及决策关系矩阵ND分别为

 ${{\boldsymbol{N}}_{1}}=\left[\begin{matrix} 110000 \\ 110110 \\ 001001 \\ 010110 \\ 010110 \\ 001001 \\ \end{matrix} \right], {{\boldsymbol{N}}_{2}}=\left[\begin{matrix} 110110 \\ 110110 \\ 001001 \\ 110100 \\ 110010 \\ 001001 \\ \end{matrix} \right], {{\boldsymbol{N}}_{3}}=\left[\begin{matrix} 111111 \\ 110111 \\ 101011 \\ 110111 \\ 111111 \\ 111111 \\ \end{matrix} \right]$
 ${{\boldsymbol{N}}_{4}}=\left[\begin{matrix} 110010 \\ 110110 \\ 001001 \\ 010110 \\ 110110 \\ 001001 \\ \end{matrix} \right], {{\boldsymbol{N}}_{A}}=\left[\begin{matrix} 110000 \\ 110110 \\ 001001 \\ 010100 \\ 010010 \\ 001001 \\ \end{matrix} \right], {{\boldsymbol{N}}_{D}}=\left[\begin{matrix} 110010 \\ 110010 \\ 001101 \\ 001101 \\ 110010 \\ 001101 \\ \end{matrix} \right]$

 $\begin{matrix} f=({{a}_{1}}\vee {{a}_{4}})\wedge ({{a}_{1}}\vee {{a}_{2}}\vee {{a}_{4}})\wedge {{a}_{2}}= \\ ({{a}_{1}}\wedge {{a}_{2}})\vee ({{a}_{2}}\wedge {{a}_{4}}) \\ \end{matrix}$

2 属性约简算法

1) 计算正域POSA(D)。

2) 计算出关系矩阵ND，即对于任意的样本xi, xjU, 若d(xi)≠d(xj), 令ND(i, j)=0, 否则，令ND(i, j)=1。

3)∀alA, 计算可辨识矩阵Nl，对于任意xi∈POSA(D)和任意xjU, 当ND(i, j)=0时，若|fl(xi)-fl(xj)|>ε，令Nl(i, j)=1，否则令Nl(i, j)=0；当ND(i, j)=1，令Nl(i, j)=0。

4) 计算NA=∪Nl，RED←A

5)∀alA，计算N{red-al}和sum(N{red-al})，其中sum(N{ red-al})表示对矩阵N{ red-al}的行列求和。

6) 选择满足sum(N{red-ak})=$\underset{i}{\mathop{\max }}\,$(sum(N{red-ai}))的ak，令RED←{A-ak}。

7) 如果(sum(NA)-sum(Nred))/|U| < θ，输出约简RED。否则，转到5)。

 图 1 属性约简算法流程图 Fig.1 Flow chart of attribute reduction
3 实验分析

4 结束语

 [1] PAWLAK Z. Rough sets[J]. International journal of computer and information sciences, 1982, 11(5): 341-356. DOI:10.1007/BF01001956 (0) [2] SKOWRON A, RAUSZER C. The discernibility matrices and functions in information systems[C]//Slowinski R. (Ed.), Intelligent Decision Support. Dordrecht, Kluwer Academic Publishers, 1992: 331-362. (0) [3] MI J S, WU W Z, ZHANG W X. Approaches to knowledge reduction based on variable precision rough sets model[J]. Information sciences, 2004, 159(3/4): 255-272. (0) [4] WU W Z, ZHANG W X. Neighborhood operator systems and approximations[J]. Information sciences, 2002, 144(1/4): 201-217. (0) [5] HU Q H, YU D, LIU J F, et al. Neighborhood-rough-set based heterogeneous feature subset selection[J]. Information sciences, 2008, 178(18): 3577-3594. DOI:10.1016/j.ins.2008.05.024 (0) [6] KIM D. Data classification based on tolerant rough set[J]. Pattern recognition, 2001, 34(8): 1613-1624. DOI:10.1016/S0031-3203(00)00057-1 (0) [7] ZHAO H, WANG P, HU Q H. Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence[J]. Information sciences, 2016, 366: 134-149. DOI:10.1016/j.ins.2016.05.025 (0) [8] CHEN Y, ZHANG Z, ZHENG J, et al. Gene selection for tumor classification using neighborhood rough sets and entropy measures[J]. Journal of biomedical informatics, 2017, 67: 59-68. DOI:10.1016/j.jbi.2017.02.007 (0) [9] ZHU P, HU Q H. Adaptive neighborhood granularity selection and combination based on margin distribution optimation[J]. Information sciences, 2013, 249: 1-12. DOI:10.1016/j.ins.2013.06.012 (0) [10] 鲍丽娜, 丁世飞, 许新征, 等. 基于邻域粗糙集的极速学习机算法[J]. 济南大学学报, 2015, 29(5): 367-371. BAO Lina, DING Shifei, XU Xinzheng, et al. Extreme learning machine algorithm based on neighborhood rough sets[J]. Journal of jinan university, 2015, 29(5): 367-371. (0) [11] 谢娟英, 李楠, 乔子芮. 基于邻域粗糙集的不完整决策系统特征选择算法[J]. 南京大学学报, 2016, 47: 384-390. XIE Juanying, LI Nan, QIAO Zirui. A feature selection algorithm based on neighborhood rough sets for incomplete information systems[J]. Journal of Nanjing university, 2016, 47: 384-390. (0) [12] 徐伟华. 序信息系统与粗糙集[M]. 北京: 科学出版社, 2013. (0) [13] GRECO S, MATARAZZO B, SLOWINSKI R. Rough sets methodology for sorting problems in presence of multiple attributes and criteria[J]. European journal of operational research, 2002, 38: 247-259. (0) [14] WANG C, HE Q, CHEN D G, et al. A novel method for attribute reduction of covering decision tables[J]. Information sciences, 2014, 254: 181-196. DOI:10.1016/j.ins.2013.08.057 (0) [15] WANG C, SHAO M, SUN B, et al. An improved attribute reduction scheme with covering based rough sets[J]. Applied soft computing, 2015, 26(1): 235-243. (0) [16] ZHU W, WANG F Y. Reduction and maximization of covering generalized rough sets[J]. Information sciences, 2003, 152: 217-230. DOI:10.1016/S0020-0255(03)00056-2 (0) [17] DUBOIS D, PRADE H. Rough fuzzy sets and fuzzy rough sets[J]. International journal of general systems, 1990, 17: 191-208. DOI:10.1080/03081079008935107 (0) [18] WANG C, QI Y, HU Q, et al. A fitting model for feature selection with fuzzy rough sets[J]. IEEE transaction on fuzzy systems, 2016, 99: 1-1. (0) [19] WANG C, SHAO M, QIAN Y. Feature subset selection based on fuzzy neighborhood rough sets[J]. Knowledge-based systems, 2016, 111(1): 173-179. (0) [20] CHEN D G, ZHANG L, ZHAO S Y, et al. A novel algorithm for finding reducts with fuzzy rough sets[J]. IEEE transaction on fuzzy systems, 2013, 20(2): 385-389. (0) [21] WANG X Z, ZHAI J H, LU S X. Induction of multiple fuzzy decision trees based on rough set technique[J]. Information sciences, 2008, 178(16): 3188-3202. DOI:10.1016/j.ins.2008.03.021 (0) [22] ZHAO S Y, TSANG C C, CHEN D. Building a rule-based classifier by using fuzzy rough set technique[J]. IEEE transaction on knowledge and data engineering, 2010, 22(5): 624-638. DOI:10.1109/TKDE.2009.118 (0)