重要度集成的属性约简方法研究
«上一篇
 文章快速检索 高级检索

 智能系统学报  2018, Vol. 13 Issue (3): 414-421  DOI: 10.11992/tis.201706080 0

引用本文

LI Jingzheng, YANG Xibei, DOU Huili, et al. Research on ensemble significance based attribute reduction approach[J]. CAAI Transactions on Intelligent Systems, 2018, 13(3), 414-421. DOI: 10.11992/tis.201706080.

文章历史

1. 江苏科技大学 计算机学院, 江苏 镇江 212003;
2. 南京理工大学 经济管理学院, 江苏 南京 210094;
3. 江苏科技大学 数理学院, 江苏 镇江 212003

Research on ensemble significance based attribute reduction approach
LI Jingzheng1, YANG Xibei1,2, DOU Huili1, WANG Pingxin3, CHEN Xiangjian1
1. School of Computer, Jiangsu University of Science and Technology, Zhenjiang 212003, China;
2. School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China;
3. School of Mathematics and Physics, Jiangsu University of Science and Technology, Zhenjiang 212003, China
Abstract: In the process of computing reduct using a heuristic algorithm, the attribute with the highest importance is gradually added in. However, this approach neglects the fluctuation of important calculations which is directly caused by data perturbation. Notably, such fluctuation may lead to an unstable reduct result. To eliminate such an anomaly, a framework consisting of a heuristic algorithm based on the importance of the ensemble attribute was proposed. In this approach, firstly, multiple sampling is executed for raw data; secondly, in each cycle, the importance of each attribute is computed on the basis of each sampling and the importance indices are integrated; finally, the attribute with the highest importance is added into the reduct. The experimental results obtained by utilizing the neighborhood rough set method show that the new approach not only obtains a more stable reduct, but also attains the classification results with high uniformity.
Key words: attribute reduction    classification    clustering    data perturbation    ensemble    heuristic algorithm    neighborhood rough set    stability

1 邻域粗糙集

 ${\rm{Int}}({x_i}) = \mathop {\min }\limits_{1 \leqslant j \leqslant n,j \ne i} {r_{ij}} + \delta \times (\mathop {\max }\limits_{1 \leqslant j \leqslant n,j \ne i} {r_{ij}} - \mathop {\min }\limits_{1 \leqslant j \leqslant n,j \ne i} {r_{ij}})$ (1)

 $\delta ({x_i}) = \{ {x_i} \in U|{x_j} \ne {x_i},{r_{ij}} \leqslant {\rm{Int}}({x_i})\}$ (2)

 $\underline {{N_B}} D = \bigcup\limits_{i{\rm{ = }}1}^N {\underline {{N_B}} {X_i}}$ (3)
 $\overline {{N_B}} D = \bigcup\limits_{i{\rm{ = }}1}^N {\overline {{N_B}} {X_i}}$ (4)

 $\underline {{N_B}} {X_i} = \{ {x_i} \in U|{\delta _B}({x_i}) \subseteq {X_i}\}$ (5)
 $\overline {{N_B}} {X_i} = \{ {x_i} \in U|{\delta _B}({x_i}) \cap {X_i} \ne \text{Ø} \}$ (6)

 $\gamma (B,D) = \frac{{{\rm{|}}\underline {{N_B}} D{\rm{|}}}}{{|U|}}$ (7)

2 属性约简 2.1 属性重要度与启发式算法

 ${\rm{Sig}}(a,B,D) = \gamma (B \cup \{ a\} ,D) - \gamma (B,D)$ (8)

1) seq←Ø，γ(seq, D)=0；

2) 若AT–seq = Ø，则转至5)，否则转至3)；

3) ${\rm {\forall}}$ ai∈AT–seq;

4) 选择aj，满足Sig(aj, seq, D)=max{Sig(ai, seq, D): ${\rm {\forall}}$ ai∈AT–seq}，令seq=seq∪{aj}，返回2),计算Sig(ai, seq, D)；

5) 输出seq。

2.2 集成属性重要度

1) ${U'} = \text{Ø}$

2) 利用k-means聚类获得U上的类簇C = $\{ {C_1},{C_2}, \cdots ,{C_N}\}$ ，其中N为决策类的个数；

3) for j = 1 to N

①计算类簇Cj中每个样本到类簇中心的平均距离 ${\overline d _j}$

②将Cj中到类簇中心的距离大于平均距离 ${\overline d _j}$ 的样本挑选出来加入 ${U'}$

end for

4) 输出 ${\rm{D}}{{\rm{S}}'}$

1) seq←Ø，γ(seq, D)=0；

2) for r = 1 to k

end for

3) 若AT–seq = Ø，则转至7)，否则转至4)；

4) for r = 1 to k

${\rm {\forall}}$ ai∈AT–seq，计算属性ai在决策系统DSr上的重要度Sigr(ai, AT, D)；

end for

5) ${\rm {\forall}}$ ai∈AT– seq，融合属性ai在各个决策系统上的重要度：

 ${\rm{Sig}}({a_i},{\rm{AT}},D) = \frac{{\displaystyle\sum\limits_{r = 1}^K {{\rm{Si}}{{\rm{g}}_r}({a_i},{\rm{AT}},D)} }}{k};$

6) 选择aj，满足Sig(aj, seq, D)=max{Sig(ai, seq, D)： ${\rm {\forall}}$ ai∈AT–seq}，令seq=seq∪{aj}，返回3)；

7) 输出seq。

3 实验分析

3.1 属性序列的稳定性比较

 ${\rm{Sta}} = \frac{{2\displaystyle\sum\limits_{i = 1}^{d - 1} {\displaystyle\sum\limits_{j = i + 1}^d {{\rm{Sim}}({\rm{se}}{{\rm{q}}_i},{\rm{se}}{{\rm{q}}_j})} } }}{{d \times (d - 1)}}$ (9)

 ${\rm{Sim}}({\rm{se}}{{\rm{q}}_i},{\rm{se}}{{\rm{q}}_j}) = 1 - 6 \times \sum\limits_{l = 1}^n {\frac{{{{({\rm{seq}}_i^l - {\rm{seq}}_j^l)}^2}}}{{n \times ({n^2} - 1)}}}$ (10)

3.2 分类结果的一致性比较

 $Q = \frac{{{a_{uv}}{d_{uv}} - {b_{uv}}{c_{uv}}}}{{{a_{uv}}{d_{uv}} + {b_{uv}}{c_{uv}}}}$ (11)

3.3 分类精度比较