«上一篇
 文章快速检索 高级检索

 智能系统学报  2019, Vol. 14 Issue (6): 1163-1169  DOI: 10.11992/tis.201904050 0

### 引用本文

HE Qiang, ZHANG Jiaoyang. Kernel-target alignment multi-kernel fuzzy support vector machine[J]. CAAI Transactions on Intelligent Systems, 2019, 14(6): 1163-1169. DOI: 10.11992/tis.201904050.

### 文章历史

Kernel-target alignment multi-kernel fuzzy support vector machine
HE Qiang , ZHANG Jiaoyang
School of Science, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
Abstract: Support vector machines (SVMs) are widely used machine learning techniques. They are used to construct an optimal hyper-plane and have an extraordinary generalization capability and good performance. However, SVMs are sensitive to noise, and it is difficult to select an appropriate kernel for SVMs. In this paper, we introduce kernel-target alignment-based multi-kernel learning method into fuzzy support vector machine (FSVM) and propose the kernel-target alignment-based multi-kernel fuzzy support vector machine (MFSVM). First, we assign the corresponding membership degree to each sample point by the fuzzy rough set method, and then calculate the kernel weight by the multi-kernel learning based on the kernel alignment. Then, the combined kernel is introduced into the fuzzy SVM. The proposed method not only improves the anti-noise ability of the SVM but also effectively avoids the problem of kernel selection. Experiments on nine datasets of the UCI database show that the proposed method has a high classification accuracy, which verifies its feasibility and effectiveness.
Key words: kernels    support vector machines    rough set theory    supervised learning    fuzzy classification    fuzzy set membership functions    robustness    noise

1 相关工作 1.1 经典支持向量机

 $\begin{array}{l} \displaystyle \min \;\;\frac{1}{2}{\left\| {{\omega }} \right\|^2} + C\sum\limits_{i = 1}^l {{\xi _i}} \\ \displaystyle {\rm{s.t.}}{y_i}\left( {{{{\omega }}^{\rm T}} \cdot {{{x}}_i} + b} \right) \geqslant 1 - {\xi _i} \\ \displaystyle {\xi _i} \geqslant 0,i = 1,2, \cdots ,l \\ \end{array}$ (1)

 $\begin{gathered} \max \;\;\sum\limits_{i = 1}^l {{\alpha _i}} - \frac{1}{2}\sum\limits_{i = 1}^l {\sum\limits_{j = 1}^l {{\alpha _i}{\alpha _j}{y_i}{y_j}\left\langle {{{{x}}_i},{{{x}}_j}} \right\rangle } } \\[-6pt] {\rm{s.t.}}\;\;\sum\limits_{i = 1}^l {{\alpha _i}{y_i} = 0,\;\;0 \leqslant {\alpha _i} \leqslant C} ,\;\;i,j = 1,2, \cdots ,l \\ \end{gathered}$ (2)

 $f({{x}}) = {\rm{sign}}(\sum\limits_{i = 1}^l {{\alpha _i}{y_i} < {{{x}}_i},{{x}} > +b)}$ (3)

 $k({{x}},{{x'}}) = \left\langle {\varphi ({{x}}),\varphi ({{x'}})} \right\rangle$ (4)

 $\begin{gathered} \max \;\;\sum\limits_{i = 1}^l {{\alpha _i}} - \frac{1}{2}\sum\limits_{i = 1}^l {\sum\limits_{j = 1}^l {{\alpha _i}{\alpha _j}{y_i}{y_j}k\left( {{{{x}}_i},{{{x}}_j}} \right)} } \\[-6pt] {\rm{s.t.}}\;\;\sum\limits_{i = 1}^l {{\alpha _i}{y_i}} = 0,\;0 \leqslant {\alpha _i} \leqslant C,\;i,j = 1,2, \cdots ,l \\ \end{gathered}$ (5)

 $f\left( {{x}} \right) = {\rm{sgn}}\left( {\sum\limits_{i = 1}^l {{y_i}{\alpha _i}k\left( {{{{x}}_i},{{x}}} \right)} +b} \right)$ (6)

1.2 模糊支持向量机

 $\begin{array}{l}\displaystyle \min \;\;\frac{1}{2}{\left\| {{\omega }} \right\|^2} + C\sum\limits_{i = 1}^l {{s_i}{\xi _i}} \\ \displaystyle {\rm{s.t.}}{y_i}\left( {{{{\omega }}^{\rm T}} \cdot {{{x}}_i} + b} \right) \geqslant 1 - {\xi _i} \\ \displaystyle {\xi _i} \geqslant 0,i = 1,2, \cdots ,l \\ \end{array}$ (7)

 $\begin{array}{l}\displaystyle \qquad \max \;\;\sum\limits_{i = 1}^l {{\alpha _i}} - \frac{1}{2}\sum\limits_{i = 1}^l {\sum\limits_{j = 1}^l {{\alpha _i}{\alpha _j}{y_i}{y_j}\left\langle {{{{x}}_i},{{{x}}_j}} \right\rangle } } \\[-2pt] \displaystyle {\rm{s.t.}}\;\;\;\sum\limits_{i = 1}^l {{\alpha _i}{y_i}} = 0,\;0 \leqslant {\alpha _i} \leqslant C{s_i},\;i,j = 1,2, \cdots ,l \\ \end{array}$ (8)

 $f\left( {{x}} \right) = {\rm{sgn}} \left( {\sum\limits_{i = 1}^l {{\alpha _i}{y_i}\left\langle {{{{x}}_i},{{x}}} \right\rangle } + b} \right)$ (9)

2 多核学习

1）线性组合方式[9]，也是现在使用最广泛的方式，如简单单核相加和加权线性组合：

 ${k_{{\eta }}}\left( {{{{x}}_i},{{{x}}_j}} \right) = \sum\limits_{m = 1}^p {{\eta _m}{k_m}\left( {{{x}}_i^m,{{x}}_j^m} \right)}$ (10)

2）非线性组合方式[10]，例如乘积、点积和幂：

 ${k_{{\eta }}}\left( {{{{x}}_i},{{{x}}_j}} \right) = \prod\limits_{m = 1}^p {{k_m}\left( {{{x}}_i^m,{{x}}_j^m} \right)}$ (11)

3）数据相关组合方式[11]，这种方式会对每个数据样本分配一个特定的权重，这样可以得到数据的局部分布状况，进而根据分布情况对每个区域学习到合适的组合规则。

 $A\left( {{{T}},{{{K}}_1},{{{K}}_2}} \right) = \frac{{{{\left\langle {{{{{ K}}}_1},{{{K}}_2}} \right\rangle }_F}}}{{\sqrt {{{\left\langle {{{{{ K}}}_1},{{{K}}_1}} \right\rangle }_F}{{\left\langle {{{{K}}_2},{{{K}}_2}} \right\rangle }_F}} }}$ (12)

 $\begin{array}{l}\displaystyle \min \;\;\frac{1}{2}\left\| {{{{\omega }}_{{\eta }}}} \right\|_2^2 + C\sum\limits_{i = 1}^l {{\xi _i}} \\ \displaystyle {\rm{s.t.}}{y_i}\left( {{{{{\textit{ω}}}}_{{\eta }}}^{\rm T} \cdot {\varPhi _{{\eta }}}\left( {{{{x}}_i}} \right) + b} \right) \geqslant 1 - {\xi _i} \\ \displaystyle {\xi _i} \geqslant 0,i = 1,2, \cdots ,l \\ \end{array}$ (13)

 $f\left( {{x}} \right) = {\rm{sgn}} \left( \sum\limits_{i = 1}^l {{\alpha _i}{y_i}{k_{{\eta }}}\left( {{{{x}}_i},{{x}}} \right) + b} \right)$ (14)
3 模糊多核支持向量机

 ${{{K}}_{{\eta }}}\left( {{{{x}}_i},{{{x}}_j}} \right) = \sum\limits_{m = 1}^p {{\eta _m}{{{K}}_m}\left( {{{{x}}_i},{{{x}}_j}} \right)}$ (15)

 $\begin{gathered} \max \;A \left( {{{T}},{{{K}}_{{\eta }}},{{y}}{{{y}}^{\rm T}}} \right)= \\[-6pt] \displaystyle\frac{{{{\left\langle {\displaystyle\sum\limits_{m = 1}^p {{\eta _m}{{{K}}_m}} ,{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}}}{{\sqrt {{{\left\langle {\displaystyle\sum\limits_{m = 1}^p {{\eta _m}{{{K}}_m}} ,\displaystyle\sum\limits_{m = 1}^p {{\eta _m}{{{K}}_m}} } \right\rangle }_F}{{\left\langle {{{y}}{{{y}}^{\rm T}},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} }}= \\[-6pt] \displaystyle\frac{{\displaystyle\sum\limits_{m = 1}^p {{\eta _m}{{\left\langle {{{{K}}_m},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} }}{{l\sqrt {\displaystyle\sum\limits_{m = 1}^p {\displaystyle\sum\limits_{j = 1}^p {{\eta _m}{\eta _j}{{\left\langle {{{{K}}_m},{{{K}}_j}} \right\rangle }_F}} } } }} \\[-6pt] {\rm{s.t.}}\;\;{{\eta }} \in {\rm{R}} _ + ^{p} \\ \end{gathered}$ (16)

 $\begin{gathered} \displaystyle\mathop {\max }\limits_{{\eta }} \;\sum\limits_{m = 1}^p {{\eta _m}{{\left\langle {{{{K}}_m},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} \\ \displaystyle {\rm{s.t.}}\;\sum\limits_{m = 1}^p {\sum\limits_{j = 1}^p {{\eta _m}{\eta _j}{{\left\langle {{{{K}}_m},{{{K}}_j}} \right\rangle }_F}} } = C, \\ \displaystyle{\eta _m} \geqslant 0,m = 1,2, \cdots ,p \\ \end{gathered}$ (17)

 $\begin{gathered} \mathop {\max }\limits_{{\eta }} \;\sum\limits_{m = 1}^p {{\eta _m}{{\left\langle {{{{K}}_m},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} - \\ {\textit{λ}} \left( {\sum\limits_{m = 1}^p {\sum\limits_{j = 1}^p {{\eta _m}{\eta _j}{{\left\langle {{{{K}}_m},{{{K}}_j}} \right\rangle }_F}} } - C} \right)\; \\ {\rm{s.t.}}\;{\eta _m} \geqslant 0,m = 1,2, \cdots ,p \\ \end{gathered}$ (18)

观察式 (18)， $\lambda$ 随着 $C$ 的变化而变化，根据核对齐的定义，当核权重 ${\eta _m}$ 做线性变化时，核对齐的值保持不变，所以可考虑 ${\textit{λ}} = 1$ ，那么式 (18) 优化问题变为

 $\begin{gathered} \mathop {\max }\limits_{{\eta }} \;\sum\limits_{m = 1}^p {{\eta _m}{{\left\langle {{{{K}}_m},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} - \sum\limits_{m = 1}^p {\sum\limits_{j = 1}^p {{\eta _m}{\eta _j}{{\left\langle {{{{K}}_m},{{{K}}_j}} \right\rangle }_F}} } \\ {\rm{s.t.}}\;{\eta _m} \geqslant 0,m = 1,2, \cdots ,p \\ \end{gathered}$ (19)

为了避免核函数在训练集中过拟合，在优化问题中添加对 ${\eta _m}$ 的约束，即变为

 $\begin{gathered} \mathop {\max }\limits_{{\eta }} \;\sum\limits_{m = 1}^p {{\eta _m}{{\left\langle {{{{K}}_m},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} - \\[-6pt] \sum\limits_{m = 1}^p {\sum\limits_{j = 1}^p {{\eta _m}{\eta _j}{{\left\langle {{{{K}}_m},{{{K}}_j}} \right\rangle }_F}} } - \delta \sum\limits_{m = 1}^p {\eta _m^2} \\[-4pt] {\rm{s.t.}}\;{\eta _m} \geqslant 0,m = 1,2, \cdots ,p \\ \end{gathered}$ (20)

 $\begin{gathered} \mathop {\max }\limits_{{\eta }} \;\sum\limits_{m = 1}^p {{\eta _m}{{\left\langle {{{{K}}_m},{{y}}{{{y}}^{\rm T}}} \right\rangle }_F}} - \\[-6pt] \sum\limits_{m = 1}^p {\sum\limits_{j = 1}^p {{\eta _m}{\eta _j}} } \left( {{{\left\langle {{{{K}}_m},{{{K}}_j}} \right\rangle }_F} + \delta {O_{mj}}} \right) \\[-4pt] {\rm{s.t.}}\;{\eta _m} \geqslant 0,m = 1,2, \cdots ,p \\ \end{gathered}$ (21)

 $\begin{gathered} \min \;\;\frac{1}{2}{\left\| {{{{\omega }}_{{\eta }}}} \right\|^2} + C\sum\limits_{i = 1}^l {{s_i}{\xi _i}} \\[-4pt] {\rm{s.t.}}\;\;{y_i}\left( {{{{\textit{ω}}}_{{\eta }}}^{\rm T} \cdot {\varPhi _{{\eta }}}\left( {{{{x}}_i}} \right) + b} \right) \geqslant 1 - {\xi _i} \\[-4pt] \;\;\;\;\;\;\;{\xi _i} \geqslant 0,i = 1,2, \cdots ,l \\ \end{gathered}$ (22)

 $\begin{gathered} \max \;\;\sum\limits_{i = 1}^l {{\alpha _i}} - \frac{1}{2}\sum\limits_{i = 1}^l {\sum\limits_{j = 1}^l {{\alpha _i}{\alpha _j}{y_i}{y_j}{k_{{\eta }}}\left\langle {{{{x}}_i},{{{x}}_j}} \right\rangle } } \\[-6pt] {\rm{s.t.}}\;\;\sum\limits_{i = 1}^l {{\alpha _i}{y_i}} = 0,0 \leqslant {\alpha _i} \leqslant C{s_i},\;\;i,j = 1,2, \cdots ,l \\ \end{gathered}$ (23)

 $\begin{gathered} f\left( {{x}} \right) = {\rm{sgn}} \left( {\sum\limits_{i = 1}^l {{\alpha _i}{y_i}{k_{{\eta }}}\left( {{{{x}}_i},{{x}}} \right)} + b} \right)= \\[-4pt] {\rm{sgn}} \left( {\sum\limits_{i = 1}^l {\sum\limits_{m = 1}^p {{\alpha _i}{y_i}{\eta _m}{k_m}\left( {{{{x}}_i},{{x}}} \right)} } + b} \right) \\ \end{gathered}$ (24)

4 实验仿真

UCI数据库是加州大学欧文分校提出的用于机器学习的数据库，是常用的标准测试数据库集，为了验证本文所提出方法的可行性与有效性，本文在UCI中选取了9个数据集，有关信息如表1所示。由于本文只考虑两类分类问题，所以对于wine数据集，将类别2与3视为一类处理。

5 结论

 [1] VAPNIK V N. The nature of statistical learning theory[M]. 2nd ed. New York: Springer-Verlag, 1999: 69-110. (0) [2] 邓乃扬, 田英杰. 支持向量机: 理论、算法与拓展[M]. 北京: 科学出版社, 2009: 115–132. (0) [3] LIN Chunfu, WANG Shengde. Fuzzy support vector machines[J]. IEEE transactions on neural networks, 2002, 13(2): 464-471. DOI:10.1109/72.991432 (0) [4] SIAHROUDI S K, MOODI P Z, BEIGY H. Detection of evolving concepts in non-stationary data streams: a multiple kernel learning approach[J]. Expert systems with applications, 2018, 91: 187-197. DOI:10.1016/j.eswa.2017.08.033 (0) [5] GONEN M, ALPAYDIN E. Multiple kernel learning algorithms[J]. Journal of machine learning research, 2011, 12: 2211-2268. (0) [6] CRISTIANINI N, SHAWE-TAYLOR J, ELISSEEFF A, et al. On kernel-target alignment[C]//Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. Vancouver, Canada, 2001: 367–373. (0) [7] HE Qiang, WU Congxin. Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets[J]. Soft computing, 2011, 15(6): 1105-1114. DOI:10.1007/s00500-010-0577-z (0) [8] LIN Chunfu, WANG Shengde. Training algorithms for fuzzy support vector machines with noisy data[J]. Pattern recognition letters, 2004, 25(14): 1647-1656. DOI:10.1016/j.patrec.2004.06.009 (0) [9] CRISTIANINI N, SHAWE-TAYLOR J. An introduction to support vector machines and other kernel-based learning methods[M]. Cambridge: Cambridge University Press, 2000: 1–28. (0) [10] DE DIEGO I M, MUÑOZ A, MOGUERZA J M. Methods for the combination of kernel matrices within a support vector framework[J]. Machine learning, 2010, 78(1/2): 137-174. (0) [11] CHRISTOUDIAS C M, URTASUN R, DARRELL T. Bayesian localized multiple kernel learning. Technical Report No. UCB/EECS-2009-96[R]. Berkeley: University of California, 2009: 1531–1565. (0) [12] BEN-HUR A, NOBLE W S. Kernel methods for predicting protein-protein interactions[J]. Bioinformatics, 2005, 21(S1). (0) [13] QIU Shibin, LANE T. A Framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction[J]. IEEE/ACM transactions on computational biology and bioinformatics, 2009, 6(2): 190-199. DOI:10.1109/TCBB.2008.139 (0) [14] VARMA M, RAY D. Learning the discriminative power-invariance trade-off[C]//Proceedings of 2007 IEEE 11th International Conference on Computer Vision. Rio de Janeiro, Brazil, 2007: 1–8. (0) [15] 杜海洋. 简化多核支持向量机的研究[D]. 北京: 北京交通大学, 2015: 13–25. DU Haiyang. Research of reduced multiple kernel support vector machine[D]. Beijing: Beijing Jiaotong University, 2015: 13–25. (0) [16] WANG Tinghua, ZHAO Dongyan, TIAN Shengfeng. An overview of kernel alignment and its applications[J]. Artificial intelligence review, 2015, 43(2): 179-192. DOI:10.1007/s10462-012-9369-4 (0) [17] ZHONG Shangping, CHEN Daya, XU Qianfen, et al. Optimizing the Gaussian kernel function with the formulated kernel target alignment criterion for two-class pattern classification[J]. Pattern recognition, 2013, 46(7): 2045-2054. DOI:10.1016/j.patcog.2012.12.012 (0) [18] 刘向东, 骆斌, 陈兆乾. 支持向量机最优模型选择的研究[J]. 计算机研究与发展, 2005, 42(4): 576-581. LIU Xiangdong, LUO Bin, CHEN Zhaoqian. Optimal model selection for support vector machines[J]. Journal of computer research and development, 2005, 42(4): 576-581. (0) [19] 周炜, 周创明, 史朝辉, 等. 粗糙集理论及应用[M]. 北京: 清华大学出版社, 2015: 57–64. (0) [20] DUBOIS D, PRADE H. Rough fuzzy sets and fuzzy rough sets[J]. International journal of general systems, 1990, 17(2/3): 191-209. (0) [21] DUBOIS D, PRADE H. Putting rough sets and fuzzy sets together[M]//SŁOWIŃSKI R. Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory. Dordrecht, Netherlands: Springer, 1992: 203–232. (0) [22] CHEN Degang, YANG Yongping, WANG Hui. Granular computing based on fuzzy similarity relations[J]. Soft computing, 2011, 15(6): 1161-1172. DOI:10.1007/s00500-010-0583-1 (0) [23] CHEN Linlin, CHEN Degang, WANG Hui. Alignment based kernel selection for multi-label learning[J]. Neural processing letters, 2019, 49(3): 1157-1177. DOI:10.1007/s11063-018-9863-z (0)