﻿ 图正则化稀疏判别非负矩阵分解
«上一篇
 文章快速检索 高级检索

 智能系统学报  2019, Vol. 14 Issue (6): 1217-1224  DOI: 10.11992/tis.201811021 0

### 引用本文

XU Huimin, CHEN Xiuhong. Graph-regularized, sparse discriminant, non-negative matrix factorization[J]. CAAI Transactions on Intelligent Systems, 2019, 14(6): 1217-1224. DOI: 10.11992/tis.201811021.

### 基金项目

2018年江苏省研究生科研创新计划项目(KYCX18_1871).

### 文章历史

Graph-regularized, sparse discriminant, non-negative matrix factorization
XU Huimin , CHEN Xiuhong
School of Digital Media,Jiangnan University,Wuxi 214000,China
Abstract: Non-negative matrix factorization is a popular data representation method. Using graph regularization constraints can effectively reveal the local manifold structure between data. In order to better extract image features, a graph-regularized, sparse-discriminant, non-negative matrix factorization algorithm is proposed in this paper. The sparse linear representation between similar samples was used to construct the corresponding graph and weight matrix. The objective function using the maximum margin criterion with L2,1 -norm constraint was optimized, using the tag information of the dataset to maintain the manifold structure of samples and discrimination of characteristics, and the iterative update rules of the algorithm are given. Experiments were carried out on the ORL, AR, and COIL20 datasets. Compared with other algorithms, GSDNMF-L2,1 showed higher classification accuracy in feature extraction.
Key words: non-negative matrix factorization    feature extraction    dimensionality reduction    manifold learning    maximum margin criterion    discriminant information    sparse constraints    linear representation

1 相关工作 1.1 非负矩阵分解(NMF)

 $\begin{array}{l} \mathop {\min }\limits_{{\bf{W}},{\bf{H}}} \begin{array}{*{20}{c}} {} \end{array}\left\| {{{X}} - {{WH}}} \right\|_F^2 \\ {\rm{s.t.}}\begin{array}{*{20}{c}} {} \end{array}{{W}} \geqslant 0,{{H}} \geqslant 0 \\ \end{array}$ (1)

 $\begin{gathered} {{{W}}_{{{ik}}}} \leftarrow {{{W}}_{{{ik}}}}\dfrac{{{{\left( {{{X}}{{{H}}^{\rm{T}}}} \right)}_{{{ik}}}}}}{{{{\left( {{{WH}}{{{H}}^{\rm{T}}}} \right)}_{{{ik}}}}}},{{{H}}_{{{kj}}}} \leftarrow {{{H}}_{{{kj}}}}\dfrac{{{{\left( {{{{W}}^{\rm{T}}}{{X}}} \right)}_{{{kj}}}}}}{{{{\left( {{{{W}}^{\rm{T}}}{{WH}}} \right)}_{{{kj}}}}}}\\ \left( {i = 1,2, \cdots ,m{\text{；}}k = 1,2, \cdots ,r{\text{；}}j = 1,2, \cdots ,n} \right) \end{gathered}$ (2)
1.2 图正则化非负矩阵分解(GNMF)

 $\begin{array}{l} \mathop {\min}\limits_{{W},{{H}}} \begin{array}{*{20}{c}} {} \end{array}\left\| {{{X}} - {{WH}}} \right\|_F^2 +{\textit{λ}}{\rm{tr}}\left( {{{HL}}{{{H}}^{\rm{T}}}} \right)\\ {\rm{s.t.}}\begin{array}{*{20}{c}} {} \end{array}{{W}} \geqslant {{0}},{{H}} \geqslant {{0}} \end{array}$ (3)

 $\begin{gathered} {{{W}}_{{{ik}}}} \leftarrow {{{W}}_{{{ik}}}}\dfrac{{{{\left( {{{X}}{{{H}}^{\rm{T}}}} \right)}_{{{ik}}}}}}{{{{\left( {{{WH}}{{{H}}^{\rm{T}}}} \right)}_{{{ik}}}}}},{{{H}}_{{{kj}}}} \leftarrow {{{H}}_{{{kj}}}}\dfrac{{{{\left( {{{{W}}^{\rm{T}}}{{X}} + \lambda {{H}}{{{S}}_{{H}}}} \right)}_{{{kj}}}}}}{{{{\left( {{{{W}}^{\rm{T}}}{{WH}} + \lambda {{H}}{{{D}}_{{H}}}} \right)}_{{{kj}}}}}}\\ \left( {i = 1,2, \cdots ,m{\text{；}}k = 1,2, \cdots ,r{\text{；}}j = 1,2, \cdots ,n} \right) \end{gathered}$ (4)
1.3 最大间距准则(MMC)

 $\max {\rm{tr}}\left( {{{{P}}^{\rm{T}}}({{{S}}_{{b}}} - {{{S}}_{{w}}}){{P}}} \right)$ (5)

2 本文算法

2.1 构建权重矩阵

 Download: 图 1 k-近邻法与稀疏表示法确定的近邻 Fig. 1 Neighbor samples determined by k-nearest neighbor method and sparse representation

 $\begin{array}{l} \min \begin{array}{*{20}{c}} {} \end{array}{\left\| {{{s}}_{{i}}^{{c}}} \right\|_1}\\ {\rm{s.t.}}\begin{array}{*{20}{c}} {} \end{array}{{{x}}_{{i}}} = {{{X}}_{{c}}}{{s}}_{{i}}^{{c}},c\left( {{{{x}}_{{i}}}} \right) = c \end{array}$ (6)

 ${{{S}}_{{H}}} = {\left( {{{{S}}_{{ij}}}} \right)_{n \times n}} = {\rm diag}\left\{ {{{{S}}^{{1}}},{{{S}}^{{2}}}, \cdots ,{{{S}}^{{C}}}} \right\}$ (7)

2.2 正则化约束

 $\begin{split} \frac{1}{2}\sum\limits_{i,j = 1}^n {\left\| {{{{h}}_{{i}}} - {{{h}}_{{j}}}} \right\|_2^2{{{S}}_{{{ij}}}}} = & {\rm{tr}}\left( {{{H}}{{{D}}_{{H}}}{{{H}}^{\rm{T}}}} \right) - {\rm{tr}}\left( {{{H}}{{{S}}_{{H}}}{{{H}}^{\rm{T}}}} \right) =\\ & {\rm{tr}}\left( {{{H}}{{{L}}_{{H}}}{{{H}}^{\rm{T}}}} \right) \end{split}$ (8)

 ${\rm{tr}}\left( {{{{W}}^{\rm{T}}}\left( {{{{S}}_{{B}}} - {{{S}}_{{W}}}} \right){{W}}} \right)$ (9)

 ${\rm{tr}}\left( {{{{W}}^{\rm{T}}}\left( {\left( {{{S}}_{{B}}^ + - {{S}}_{\left| {{B}} \right|}^ - } \right) - \left( {{{S}}_{{W}}^ + - {{S}}_{\left| {{W}} \right|}^ - } \right)} \right){{W}}} \right)$ (10)

 ${{{x}}_{{i}}} \approx {{{w}}_{{1}}}{{{h}}_{{{1,i}}}} + {{{w}}_{{2}}}{{h}}{}_{{{2,i}}} + \cdots + {{{w}}_{{r}}}{{{h}}_{{{r,i}}}}$ (11)

 ${\left\| {{H}} \right\|_1} = \sum\limits_{i = 1}^r {\sum\limits_{j = 1}^n {\left| {{{{h}}_{{{ij}}}}} \right|} }$ (12)

 $\left\| {{{{H}}^{\bf{T}}}} \right\|_{2,1}^{} = \sum\limits_{j = 1}^n {{{\left\| {{{{h}}_{{j}}}} \right\|}_2} = } \sum\limits_{j = 1}^n {\sqrt {\sum\limits_{{\rm{i}} = 1}^r {{{h}}_{{{ij}}}^2} } }$ (13)

2.3 优化模型与求解

 $\begin{split} & min \begin{array}{*{20}{c}} {} \end{array}\left\| {{{X}} - {{WH}}} \right\|_F^2 + \lambda {\rm{tr}}\left( {{{H}}{{{L}}_{{H}}}{{{H}}^{\rm{T}}}} \right) + \\ & \beta {\rm{tr}}\left( {{{{W}}^{\rm{T}}}\left( {\left( {{{S}}_{{W}}^ + - {{S}}_{\left| {{W}} \right|}^ - } \right) - \left( {{{S}}_{{B}}^ + - {{S}}_{\left| {{B}} \right|}^ - } \right)} \right){{W}}} \right) + \mu \left\| {{{{H}}^{\rm{T}}}} \right\|_{2,1}^{} \\& {\rm{s.t.}}\begin{array}{*{20}{c}} {} \end{array}{{W}} \geqslant 0,{{H}} \geqslant 0 \end{split}$ (14)

${\varPhi _{ik}}$ ${\varPsi _{kj}}$ 为拉格朗日乘子，构造拉格朗日函数：

 $\begin{gathered} L = \left\| {{{X}} - {{WH}}} \right\|_F^2 +{\textit{λ}} {\rm{tr}}\left( {{{H}}{{{L}}_{{H}}}{{{H}}^{\rm{T}}}} \right) + \\ \beta {\rm{tr}}\left( {{{{W}}^{\rm{T}}}\left( {\left( {{{S}}_{{W}}^ + - {{S}}_{\left| {{W}} \right|}^ - } \right) - \left( {{{S}}_{{B}}^ + - {{S}}_{\left| {{B}} \right|}^ - } \right)} \right){{W}}} \right) + \\ \mu {\rm{tr}}\left( {{{{H}}^{\rm{T}}}{{QH}}} \right) - {\rm{tr}}\left( {\varPhi {{{W}}^{\rm{T}}}} \right) - {\rm{tr}}\left( {\varPsi {{H}}} \right) \end{gathered}$ (15)

 $\begin{gathered} \frac{{\partial L}}{{\partial {{W}}}} = - 2{{X}}{{{H}}^{\rm{T}}} + 2{{WH}}{{{H}}^{\rm{T}}} + \\ 2\beta \left( {\left( {{{S}}_{{W}}^ + - {{S}}_{\left| {{W}} \right|}^ - } \right) - \left( {{{S}}_{{B}}^ + - {{S}}_{\left| {{B}} \right|}^ + } \right)} \right){{W}} - {{\varPhi }} = 0 \end{gathered}$ (16)

 ${\left( { - {{X}}{{{H}}^{\rm{T}}} + {{WH}}{{{H}}^{\rm{T}}} + \beta \left( {\left( {{{S}}_{{W}}^ + - {{S}}_{\left| {{W}} \right|}^ - } \right) - \left( {{{S}}_{{B}}^ + - {{S}}_{\left| {{B}} \right|}^ - } \right)} \right){{W}}} \right)_{ik}}{{{W}}_{{{ik}}}} = 0$ (17)

 ${{W}}_{{{ik}}}^{t + 1} \leftarrow {{W}}_{{{ik}}}^t\frac{{{{\left( {{{X}}{{{H}}^{\rm{T}}} + \beta \left( {{{S}}_{\left| {{W}} \right|}^ - + {{S}}_{{B}}^ + } \right)} \right)}_{ik}}}}{{{{\left( {{{WH}}{{{H}}^{\rm{T}}} + \beta \left( {{{S}}_{{W}}^ + + {{S}}_{\left| {{B}} \right|}^ - } \right)} \right)}_{ik}}}}$ (18)

 $\begin{gathered} {{H}}_{{{kj}}}^{t + 1} \leftarrow \\ {{H}}_{{{kj}}}^t\frac{{{{\left( {{{{W}}^{\rm{T}}}{{X}} +{\textit{λ}} {{H}}{{{S}}_{{H}}} + \beta {\nabla _{{H}}}{\rm{tr}}\left( {{{W}}\left( {{{S}}_{\left| {{W}} \right|}^ - + {{S}}_{{B}}^ + } \right){{{W}}^{\rm{T}}}} \right)} \right)}_{kj}}}}{{{{\left( {{{{W}}^{\rm{T}}}{{WH}} + {\textit{λ}} {{H}}{{{D}}_{{H}}} + \beta {\nabla _{{H}}}{\rm{tr}}\left( {{{W}}\left( {{{S}}_{{W}}^ + + {{S}}_{\left| {{B}} \right|}^ - } \right){{{W}}^{\rm{T}}}} \right) + \mu {{HQ}}} \right)}_{kj}}}} \end{gathered}$ (19)

1)利用式(6)和式(7)计算权重矩阵 ${{{S}}_{{H}}}$

2)利用式(9)计算类内散度 ${{{S}}_{{W}}}$ 和类间散度 ${{{S}}_{{B}}}$

3)随机初始化 ${{{W}}_{{0}}}{\rm{ }} \in {\rm{R}}_{\rm{ + }}^{{\rm{m}} \times {\rm{k}}}$ ${{{H}}_{{0}}} \in {\rm{R}}_{\rm{ + }}^{{\rm{k}} \times {\rm{n}}}$

4)

Fort= 1: ${T_0}$

End

2.4 算法复杂度

$m$ 为样本维度， $n$ 为样本个数， $k$ 为特征数。算法GSDNMF-L2,1主要由1)、2)和4)组成，其中步骤1)是构建数据图的权矩阵，其复杂度为 $O\left( {{n^2}m} \right)$ ；步骤2)则是计算类内散度 ${{{S}}_{{W}}}$ 和类间散度 ${{{S}}_{{B}}}$ ，其复杂度为 $O\left( {{m^2}} \right)$ ；对步骤4)，一次迭代计算 ${{W}}$ $H$ 的复杂度 $O\left( {mnk} \right)$ ，假设算法在 ${T_0}$ 次迭代后收敛，则整个步骤4)的复杂度为 $O\left( {{T_0}mnk} \right)$ 。综上可知，GSDNMF-L2,1算法的总体时间复杂度为O $\left( {{T_0}mnk + {n^2}m + {m^2}} \right)$

3 实验分析

3.1 数据集

ORL人脸数据库包含了40个人的人脸图像，每个人有10张，图像具有不同面部表情(睁/闭眼，笑/不笑)、面部细节(眼镜/无眼镜)及不同光照条件。

AR数据集包含了120个人的人脸图像，每个人有26张图像，图像具有不同的面部表情、照明条件和遮挡(太阳眼镜/围巾)。

COIL20数据集包含了20个不同的物体(玩具小鸭、招财猫、杯子等)，其中，每个物体在水平面上每隔5°拍摄一张图片，因此每个物体一共有72幅图片，整个数据集共计l 440幅图片。

 Download: 图 2 数据集示例 Fig. 2 The instances of datasets
3.2 参数选择

GSDNMF-L2,1算法包含了图正则化约束参数λ、判别约束参数 $\beta$ 和稀疏参数 $\mu$ 等3个关键参数。为说明3个参数对识别率的影响，分别在两个数据集上采取固定两个参数来确定另一个参数的方法进行讨论，设参数的取值范围为[0,1]，取数据集的类别数作为基图数(ORL取40，AR取120，COIL20取20)，最大迭代次数为300次，实验所得的平均识别率与正则化参数的关系如图3所示。

 Download: 图 3 参数对识别率的影响 Fig. 3 The influence of parameters
3.3 实验结果及分析

 Download: 图 4 训练时间随基图数变化的曲线 Fig. 4 Variation curve of training time

 Download: 图 5 损失函数的变化曲线 Fig. 5 Variation curve of loss function

 $s(x) = \frac{{\sqrt n - \left( {\displaystyle\sum\limits_i {{{{x}}_{{i}}}} /\sqrt {\displaystyle\sum\limits_i {{{x}}_{{i}}^2} } } \right)}}{{\sqrt n - 1}}$ (16)

 Download: 图 6 比较NDMF，GSDNMF-L1和GSDNMF-L2,1算法在ORL、AR和COIL20数据集上得到的基图 Fig. 6 Comparison of basic images computed by NDMF, GSDNMF-L1 and GSDNMF-L2,1 algorithm on ORL,AR and COIL20 dataset
4 结束语

 [1] WOLD S, ESBENSEN K, GELADI P. Principal component analysis[J]. Chemometrics and intelligent laboratory systems, 1987, 2(1/2/3): 37-52. (0) [2] BELHUMEUR P N, HESPANHA J P, KRIEGMAN D J. Eigenfaces vs. Fisherfaces: recognition using class specific linear projection[C]//Proceedings of the 4th European Conference on Computer Vision. Cambridge, UK, 1996: 45-58. (0) [3] LEE D D, SEUNG H S. Learning the parts of objects by non-negative matrix factorization[J]. Nature, 1999, 401(6755): 788-791. DOI:10.1038/44565 (0) [4] WANG Yuan, JIA Yunde, HU Changbo, et al. Fisher non-negative matrix factorization for learning local features[C]//Proceedings of the 6th Asian Conference on Computer Vision. Jeju, Korea, 2004. (0) [5] ZAFEIRIOU S, TEFAS A, BUCIU I, et al. Exploiting discriminant information in nonnegative matrix factorization with application to frontal face verification[J]. IEEE transactions on neural networks, 2006, 17(3): 683-695. DOI:10.1109/TNN.2006.873291 (0) [6] KOTSIA I, ZAFEIRIOU S, PITAS I. A novel discriminant non-negative matrix factorization algorithm with applications to facial image characterization problems[J]. IEEE transactions on information forensics and security, 2007, 2(3): 588-595. DOI:10.1109/TIFS.2007.902017 (0) [7] GU Quanquan, ZHOU Jie. Two dimensional maximum margin criterion[C]// IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei, Taiwan, 2009: 1621−1624. (0) [8] LI Haifeng, JIANG Tao, ZHANG Keshu. Efficient and robust feature extraction by maximum margin criterion[J]. IEEE transactions on neural networks, 2006, 17(1): 157-165. DOI:10.1109/TNN.2005.860852 (0) [9] LU Yuwu, LAI Zhihui, XU Yong, et al. Nonnegative discriminant matrix factorization[J]. IEEE transactions on circuits and systems for video technology, 2017, 27(7): 1392-1405. DOI:10.1109/TCSVT.2016.2539779 (0) [10] CAI Deng, HE Xiaofei, HAN Jiawei, et al. Graph regularized nonnegative matrix factorization for data representation[J]. IEEE transactions on pattern analysis and machine intelligence, 2011, 33(8): 1548-1560. DOI:10.1109/TPAMI.2010.231 (0) [11] LONG Xianzhong, LU Hongtao, PENG Yong, et al. Graph regularized discriminative non-negative matrix factorization for face recognition[J]. Multimedia tools and applications, 2014, 72(3): 2679-2699. DOI:10.1007/s11042-013-1572-z (0) [12] LIAO Qing, ZHANG Qian. Local coordinate based graph-regularized NMF for image representation[J]. Signal processing, 2016, 124: 103-114. DOI:10.1016/j.sigpro.2015.09.038 (0) [13] LI Xuelong, CUI Guosheng, DONG Yongsheng. Graph regularized non-negative low-rank matrix factorization for image clustering[J]. IEEE transactions on cybernetics, 2017, 47(11): 3840-3853. DOI:10.1109/TCYB.2016.2585355 (0) [14] SHANG Fanhua, JIAO L C, WANG Fei. Graph dual regularization non-negative matrix factorization for co-clustering[J]. Pattern recognition, 2012, 45(6): 2237-2250. DOI:10.1016/j.patcog.2011.12.015 (0) [15] MENG Yang, SHANG Ronghua, JIAO Licheng, et al. Feature selection based dual-graph sparse non-negative matrix factorization for local discriminative clustering[J]. Neurocomputing, 2018, 290: 87-99. DOI:10.1016/j.neucom.2018.02.044 (0) [16] EGGERT J, KORNER E. Sparse coding and NMF[C]//Proceedings of 2004 IEEE International Joint Conference on Neural Networks. Budapest, Hungary, 2004: 2529-2533. (0) [17] BELKIN M, NIYOGI P, SINDHWANI V. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples[J]. Journal of machine learning research, 2006, 7(1): 2399-2434. (0) [18] HOU C, JING W, YI W, et al. Local linear transformation embedding[J]. Neurocomputing, 2009, 72(10-12): 2368-2378. DOI:10.1016/j.neucom.2008.12.002 (0) [19] LI H, LIU D, WANG D. Manifold regularized reinforcement learning[J]. IEEE transactions on neural networks & learning systems, 2017, 29(4): 932-943. (0) [20] HOYER P O. Non-negative matrix factorization with sparseness constraints[J]. Journal of machine learning research, 2004, 5: 1457-1469. (0) [21] NIE Feiping, HUANG Heng, CAI Xiao, et al. Efficient and robust feature selection via joint ℓ2, 1-norms minimization[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, British Columbia, Canada, 2010: 1813-1821. (0)