﻿ 基于能量的结构化最小二乘孪生支持向量机
«上一篇
 文章快速检索 高级检索

 智能系统学报  2020, Vol. 15 Issue (5): 1013-1019  DOI: 10.11992/tis.201906030 0

### 引用本文

SHI Songhui, DING Shifei. Energy-based structural least square twin support vector machine[J]. CAAI Transactions on Intelligent Systems, 2020, 15(5): 1013-1019. DOI: 10.11992/tis.201906030.

### 文章历史

1. 中国矿业大学 计算机科学与技术学院，江苏 徐州 221116;
2. 矿山数字化教育部工程研究中心，江苏 徐州 221116

Energy-based structural least square twin support vector machine
SHI Songhui 1,2, DING Shifei 1,2
1. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China;
2. Engineering Research Center (Ministry of Education) of Mine Digitization, Xuzhou 221116, China
Abstract: The least square twin support vector machine (TWSVM) is very sensitive to noise and outlier. To solve this problem, we propose an energy-based structured least square TWSVM (ES-LSTWSVM). First, we perform a cluster analysis on each class, then compute the covariance matrix of each cluster in the classes, and introduce it into the objective function. To reduce the influence of noises and outliers on the algorithm, an energy factor is introduced to each hyperplane, and the equality constraint is converted into an energy-based form on the basis of least squares. Finally, we adopt the “all-versus-one” strategy to apply the proposed algorithm in solving a multi-class classification problem. The experimental results show that ES-LSTWSVM has good classification performance.
Key words: twin support vector machine    least square    structural information    cluster    covariance matrix    energy factor    “all-versus-one” strategy    multi-class classification problem

2007年，Jayadeva等[2]提出了孪生支持向量机(twin support vector machine, TWSVM)，它为每一类构造一个超平面，使每一类样本尽可能距离本类的超平面近，而尽可能地远离另一类的超平面。TWSVM的2个非平行的超平面是通过求解2个二次规划问题(quadratic programming problems,QPPs)得到的，每个QPP的约束条件都只与一类样本有关。TWSVM不但保持了SVM的优点，而且训练速度比传统SVM要快4倍。为了进一步提升TWSVM的性能，学者们已经提出了不少改进算法，例如：孪生有界支持向量机[3](twin bounded support vector machine, TBSVM)、最小二乘孪生支持向量机[4](least square twin support vector machine, LSTWSVM)、非平行支持向量机[5](nonparallel support vector machine，NPSVM)。具体来说，LSTWSVM使用平方损失函数将TWSVM中的凸二次规划问题转化为解线性方程组，从而实现极快的训练速度。但是，LSTWSVM对噪声很敏感，因为它要求超平面与其他类的点距离恰好为1。近些年来出现了许多改进的算法。Nasiri等[6]提出了一种基于能量的LSTWSVM(energy-based least square twin support vector machine, ELS-TWSVM)模型，该模型为每一类引入一个能量因子以减少噪声的影响。马跃峰等[7]提出了一种基于全局代表点的快速最小二乘支持向量机稀疏化算法(global-representation-based sparse least squares support vector machine, GRS-LSSVM)，利用数据全局代表性指标，在全部数据中，一次性地选择出其中最具有全局代表性的数据并构成稀疏化后的支持向量集，然后在此基础上求解决策超平面。吴青等[8]提出了一种最小二乘大间隔孪生支持向量机(least squares large margin twin support vector machine, LSLMTSVM)，该方法在TWSVM的目标函数中引入间隔均值和间隔方差，通过优化间隔分布来获得具有更强泛化能力的模型。

1) 提出了一种基于能量的结构化最小二乘孪生支持向量机；

2) 采用“多对一”的策略将提出的算法扩展到多分类问题中；

3) 选择几种有代表性的相关算法进行对比实验，在UCI数据集上的实验结果表明提出的算法具有良好的分类性能。

1 相关工作

1.1 TWSVM

TWSVM通过训练数据集寻找2个不平行的分离超平面：

 ${{{x}}^{\rm{T}}}{{{w}}_1}{\rm{ + }}{b_1} = 0,{{{x}}^{\rm{T}}}{{{w}}_2}{\rm{ + }}{b_2} = 0$

 $\begin{array}{c} \min\limits_{{w_1},{b_1},{\xi _2}} \;\;\;\;\dfrac{1}{2}{({{A}}{{{w}}_1} + {{{e}}_1}{b_1})^{\rm{T}}}({{A}}{{{w}}_1} + {{{e}}_1}{b_1}) + {c_1}{{e}}_1^{\rm{T}}{\xi _2} - \\ {\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\;\;\;\;\;\; ({{B}}{{{w}}_1} + {{{e}}_2}{b_1}) \geqslant {{{e}}_2} - {\xi _2},{\xi _2} \geqslant 0 \end{array}$
 $\begin{array}{c} \min\limits_{{w_2},{b_2},{\xi _1}} \;\;\;\;\;\;\dfrac{1}{2}{({{B}}{{{w}}_2} + {{{e}}_2}{b_2})^{\rm{T}}}({{B}}{{{w}}_2} + {{{e}}_2}{b_2}) + {c_2}{{e}}_1^{\rm{T}}{\xi _1}\\ {\rm{s}}{\rm{.t}}{\rm{.}}\;\;\;\;\;\;\;({{A}}{{{w}}_2} + {{{e}}_1}{b_2}) \geqslant {{{e}}_1} - {\xi _1},{\xi _1} \geqslant 0 \end{array}$

TWSVM通过如下的决策函数判定新样本的类别标签：

 ${\rm{Label}}({{x}}) = \arg\min\limits_{l = 1,2} \left| {{{{x}}^{\rm{T}}}{{{w}}_l} + {b_l}} \right|$
1.2 S-TWSVM

 $\begin{array}{c} \min\limits_{{w_ + },{b_ + },\xi } \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}\left\| {{{A}}{{{w}}_ + } + {{{e}}_ + }{b_ + }} \right\|_2^2 + {c_1}{{e}}_ - ^{\rm{T}}\xi + \frac{1}{2}{c_2}(\left\| {{{{w}}_ + }} \right\|_2^2 + b_ + ^2)+ \\ \dfrac{1}{2}{c_3}{{w}}_{\rm{ + }}^{\rm{T}}{\displaystyle\sum\nolimits_ + }{{{w}}_ + } \\ {\rm{s}}{\rm{.t}}{\rm{.}}\begin{array}{*{20}{c}} {}&{ - ({{B}}{w_ + } + {{{e}}_ - }{b_ + }) + \xi \geqslant e_ -} ,\xi \geqslant 0 \end{array} \end{array}$
 $\begin{array}{c} \min\limits_{{w_ - },{b_ - },\eta } \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}\left\| {{{B}}{{{w}}_ - } + {{{e}}_ - }{b_ - }} \right\|_2^2 + {c_4}{{e}}_ + ^{\rm{T}}\eta + \frac{1}{2}{c_5}(\left\| {{{{w}}_ - }} \right\|_2^2 + b_ - ^2) + \\ \dfrac{1}{2}{c_6}{{w}}_ - ^{\rm{T}}{\displaystyle\sum\nolimits_ - }{{{w}}_ - } \\ {\rm{s}}{\rm{.t}}{\rm{.}}{\begin{array}{*{20}{c}} {}&{({{A}}{{{w}}_ - } + {{{e}}_ + }{b_ - }) + \eta \geqslant {{e}}_ +} ,\eta \geqslant 0 \end{array} } \end{array}$

 ${\rm{Label}}({{x}}) = \arg\min\limits_{ + , - } \left\{ {\left| {{{w}}_ + ^{\rm{T}}{{x}} + {b_ + }} \right|,\left| {{{w}}_ - ^{\rm{T}}{{x}} + {b_ - }} \right|} \right\}$
2 ES-LSTWSVM

2.1 线性 ES-LSTWSVM

 $\begin{array}{c} \min\limits_{{w_ + },{b_ + },\xi } \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}{\left\| {{{A}}{{{w}}_ + } + {{{e}}_ + }{b_ + }} \right\|^2} + \dfrac{{{c_1}}}{2}{{{\xi }}^{\rm{T}}}{{\xi }} + \\ \dfrac{{{c_2}}}{2}(\left\| {{{{w}}_ + }} \right\|_2^2 + b_ + ^2) + \dfrac{{{c_3}}}{2}{{w}}_ + ^{\rm{T}}{\displaystyle\sum\nolimits_ + }{{{w}}_ + } \\ \begin{array}{*{20}{c}} {{\rm{s}}{\rm{.t}}{\rm{.}}}&{}&{ - ({{B}}{{{w}}_ + } + {{{e}}_ - }{b_ + })} \end{array} + {{\xi }} = {{{E}}_ - } \\ \end{array}$ (1)
 $\begin{array}{c} \min\limits_{{w_ - },{b_ - },\eta } \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}{\left\| {{{B}}{{{w}}_ - } + {{{e}}_ - }{b_ - }} \right\|^2} + \dfrac{{{c_4}}}{2}{{{\eta }}^{\rm{T}}}{{\eta }} + \\ \dfrac{{{c_5}}}{2}(\left\| {{{{w}}_ - }} \right\|_2^2 + b_ - ^2) + \dfrac{{{c_6}}}{2}{{w}}_ - ^{\rm{T}}{\displaystyle\sum\nolimits_ - }{{{w}}_ - } \\ \begin{array}{*{20}{c}} {{\rm{s}}{\rm{.t}}{\rm{.}}}&{}&{{{A}}{{{w}}_ - } + {{{e}}_ + }{b_ - }} + {{\eta }} = {{{E}}_ + } \end{array} \\ \end{array}$ (2)

 $\begin{array}{c} \min\limits_{{w_ + },{b_ + }} \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}{\left\| {{{A}}{{{w}}_ + } + {{{e}}_ + }{b_ + }} \right\|^2} + \dfrac{{{c_1}}}{2}{\left\| {{{B}}{{{w}}_ + } + {{{e}}_ - }{b_ + } + {{{E}}_ - }} \right\|^2} + \\ \dfrac{{{c_2}}}{2}(\left\| {{{{w}}_ + }} \right\|_2^2 + b_ + ^2) + \dfrac{{{c_3}}}{2}{{w}}_ + ^{\rm{T}}{\displaystyle\sum\nolimits_ + }{{{w}}_ + } \\ \end{array}$ (3)

 $\begin{array}{c} {{{A}}^{\rm{T}}}({{A}}{{{w}}_ + } + {{{e}}_ + }{b_ + }) + {c_1}{{{B}}^{\rm{T}}}({{B}}{{{w}}_ + } + {{{e}}_ - }{b_ + } + {{{E}}_ - }) + \\ {c_2}{{{w}}_ + } + {c_3}{\displaystyle\sum\nolimits_ + }{{{w}}_ + } = 0 \end{array}$ (4)
 ${{e}}_ + ^{\rm{T}}({{A}}{{{w}}_ + } + {{{e}}_ + }{b_ + }) + {c_1}{{e}}_ - ^{\rm{T}}({{B}}{{{w}}_ + } + {{{e}}_ - }{b_ + } + {{{E}}_ - }) + {c_2}{b_ + } = 0$ (5)

 \begin{aligned} & \left[ {\begin{array}{*{20}{c}} {{{{A}}^{\rm{T}}}{{A}}}&{{{{A}}^{\rm{T}}}{{{e}}_ + }} \\ {{{e}}_ + ^{\rm{T}}{{A}}}&{{{e}}_ + ^T{{{e}}_ + }} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] + {c_1}\left[ {\begin{array}{*{20}{c}} {{{{B}}^{\rm{T}}}{{{B}}}}&{{{{B}}^{\rm{T}}}{{{e}}_ - }} \\ {{{e}}_ - ^{\rm{T}}{{B}}}&{{{e}}_ - ^{\rm{T}}{{{e}}_ - }} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] + \\ &\quad \quad {c_1}\left[ {\begin{array}{*{20}{c}} {{{{B}}^{\rm{T}}}} \\ {{{e}}_ - ^{\rm{T}}} \end{array}} \right]{E_ - } + {c_2}\left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] + {c_3}\left[ {\begin{array}{*{20}{c}} {{\displaystyle\sum\nolimits_ + }{{{w}}_ + }} \\ 0 \end{array}} \right] = 0 \end{aligned}
 \begin{aligned} & \left[ {\begin{array}{*{20}{c}} {{{{A}}^{\rm{T}}}} \\ {{{e}}_ + ^{\rm{T}}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{A}}&{{{{e}}_ + }} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] + {c_1}\left[ {\begin{array}{*{20}{c}} {{{{B}}^{\rm{T}}}} \\ {{{e}}_ - ^{\rm{T}}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{B}}}&{{{{e}}_ - }} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] + \\ & \quad \quad {c_2}\left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] + {c_3}\left[ {\begin{array}{*{20}{c}} {{\displaystyle\sum\nolimits_ + } {{{w}}_ + }} \\ 0 \end{array}} \right] = - {c_1}\left[ {\begin{array}{*{20}{c}} {{{{B}}^{\rm{T}}}} \\ {{{e}}_ - ^{\rm{T}}} \end{array}} \right]{{{E}}_ - } \end{aligned}

 ${z_1} = \left[ {\begin{array}{*{20}{c}} {{{{w}}_ + }} \\ {{b_ + }} \end{array}} \right] = - {c_1}{({{{P}}^{\rm{T}}}{{P}} + {c_1}{{{Q}}^{\rm{T}}}{{Q}} + {c_2}I + {c_3}{\displaystyle\sum\nolimits_1})^{ - 1}}{{{Q}}^{\rm{T}}}{{{E}}_ - }$

 ${z_2} = \left[ {\begin{array}{*{20}{c}} {{{{w}}_ - }} \\ {{b_ - }} \end{array}} \right] = {c_4}{({{{Q}}^{\rm{T}}}{{Q}} + {c_4}{{{P}}^{\rm{T}}}{{P}} + {c_5}I + {c_6}{\displaystyle\sum\nolimits_2})^{ - 1}}{{{P}}^{\rm{T}}}{{{E}}_ + }$

ES-LSTWSVM通过如下的决策函数判定新样本的类别标签：

 ${\rm{Label}}({{x}}) = \left\{ {\frac{{\left| {{{{x}}^{\rm{T}}}{{{w}}_ + } + {b_ + }} \right|}}{{\left\| {{{{w}}_ + }} \right\|}}} \right.,\left. {\frac{{\left| {{{{x}}^{\rm{T}}}{{{w}}_ - } + {b_ - }} \right|}}{{\left\| {{{{w}}_ - }} \right\|}}} \right\}$
2.2 非线性 ES-LSTWSVM

 ${{{w}}_ + } = \sum\limits_{i = 1}^{{m_1} + {m_2}} {{{({\lambda _ + })}_i}} \varPhi ({x_i}) = \varPhi (M){\lambda _ + }$
 ${{{w}}_ - } = \sum\limits_{i = 1}^{{m_1} + {m_2}} {{{({\lambda _ - })}_i}} \varPhi ({x_i}) = \varPhi (M){\lambda _ - }$

ES-LSTWSVM构造如下2个基于核函数的超平面：

$K({{{x}}^{\rm{T}}},{{{M}}^{\rm{T}}}){\lambda _ + } + {b_ + } = 0$ $K({{{x}}^{\rm{T}}},{{{M}}^{\rm{T}}}){\lambda _ - } + {b_ - } = 0$

 $\begin{array}{l} \min\limits_{{w_ + },{b_ + },\xi } \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}{\left\| {K({{A}},{{{M}}^{\rm{T}}}){\lambda _ + } + {{{e}}_ + }{b_ + }} \right\|^2} + \dfrac{{{c_1}}}{2}{{{\xi }}^{\rm{T}}}{{\xi }} + \\ \dfrac{{{c_2}}}{2}(\left\| {{{{\lambda }}_ + }} \right\|_2^2 + b_ + ^2) + \dfrac{{{c_3}}}{2}{{\lambda }}_ + ^{\rm{T}}\varPhi {({{M}})^{\rm{T}}}\displaystyle\sum\nolimits_ + ^\varPhi \varPhi ({{M}}){{{\lambda }}_ + } \\ \begin{array}{*{20}{c}} {{\rm{s}}{\rm{.t}}{\rm{.}}}&{}&{ - (K({{B}},{{{M}}^{\rm{T}}}){\lambda _ + } + {{{e}}_ - }{b_ + })}+ {{\xi }} = {{{E}}_ - } \end{array} \end{array}$ (6)
 $\begin{array}{l} \min\limits_{{w_ - },{b_ - },\eta } \begin{array}{*{20}{c}} {}&{\dfrac{1}{2}} \end{array}{\left\| {K({{B}},{{{M}}^{\rm{T}}}){\lambda _ - } + {{{e}}_ - }{b_ - }} \right\|^2} + \dfrac{{{c_4}}}{2}{{{\eta }}^{\rm{T}}}{{\eta }} + \\ \dfrac{{{c_5}}}{2}(\left\| {{{{\lambda }}_ - }} \right\|_2^2 + b_ - ^2) + \dfrac{{{c_6}}}{2}{{\lambda }}_ - ^{\rm{T}}\varPhi {({{M}})^{\rm{T}}}\displaystyle\sum\nolimits_ - ^\varPhi \varPhi ({{M}}){{{\lambda }}_ - } \\ \begin{array}{*{20}{c}} {{\rm{s}}{\rm{.t}}.}&{}&{(K({{A}},{{{M}}^{\rm{T}}}){{{\lambda }}_ - } + {{{e}}_ + }{b_ - })} \end{array} + {{\eta }} = {{{E}}_ + } \end{array}$ (7)

 $\begin{array}{c} {z_1} = \left[ {\begin{array}{*{20}{c}} {{w_ + }} \\ {{b_ + }} \end{array}} \right] = \\ - {c_1}{\left( {{{P}}_\varPhi ^{\rm{T}}{{{P}}_\varPhi } + {c_1}{{{Q}}_\varPhi }^{\rm{T}}{{{Q}}_\varPhi } + {c_2}{{I}} + {c_3}\displaystyle\sum\nolimits_1^\varPhi } \right)^{ - 1}}{{{Q}}_\varPhi }^{\rm{T}}{{{E}}_ - } \end{array}$ (8)
 $\begin{array} {c}{z_2} = \left[ {\begin{array}{*{20}{c}} {{w_ - }} \\ {{b_ - }} \end{array}} \right]=\\ {c_4}{\left( {{{{Q}}_\varPhi }^{\rm{T}}{{{Q}}_\varPhi } + {c_4}{{{P}}_\varPhi }^{\rm{T}}{{{P}}_\varPhi } + {c_5}{{I}} + {c_6}\displaystyle\sum\nolimits_2^\varPhi } \right)^{ - 1}}{{{P}}_\varPhi }^{\rm{T}}{{{E}}_ + } \end{array}$

 $\displaystyle\sum\nolimits_1^\varPhi = \left[ {\begin{array}{*{20}{c}} {\varPhi {{({{M}})}^{\rm{T}}}\displaystyle\sum\nolimits_ + ^\varPhi \varPhi ({{M}})}&0 \\ 0&0 \end{array}} \right]$
 $\displaystyle\sum\nolimits_2^\varPhi = \left[ {\begin{array}{*{20}{c}} {\varPhi {{({{M}})}^{\rm{T}}}\displaystyle\sum\nolimits_ - ^\varPhi \varPhi ({{M}})}&0 \\ 0&0 \end{array}} \right]$
3 实验结果与分析

3.1 实验结果

3.2 能量参数 ${E_ + }$ ${E_ - }$ 对算法性能的影响
 Download: 图 2 能量参数 ${E_ + }$ 和 ${E_ - }$ 对ES-LSTWSVM性能的影响 Fig. 2 Effect of energy parameters ( ${E_ + }$ and ${E_ - }$ ) on the performance of the ES-LSTWSVM