融合图模糊信息的受限玻尔兹曼机

黄晓珂 刘海涛 汪培庄

黄晓珂, 刘海涛, 汪培庄. 融合图模糊信息的受限玻尔兹曼机 [J]. 智能系统学报, 2025, 20(5): 1103-1111. doi: 10.11992/tis.202412008
引用本文: 黄晓珂, 刘海涛, 汪培庄. 融合图模糊信息的受限玻尔兹曼机 [J]. 智能系统学报, 2025, 20(5): 1103-1111. doi: 10.11992/tis.202412008
HUANG Xiaoke, LIU Haitao, WANG Peizhuang. The restricted Boltzmann machine fuses picture fuzzy information [J]. CAAI Transactions on Intelligent Systems, 2025, 20(5): 1103-1111. doi: 10.11992/tis.202412008
Citation: HUANG Xiaoke, LIU Haitao, WANG Peizhuang. The restricted Boltzmann machine fuses picture fuzzy information [J]. CAAI Transactions on Intelligent Systems, 2025, 20(5): 1103-1111. doi: 10.11992/tis.202412008

融合图模糊信息的受限玻尔兹曼机

doi: 10.11992/tis.202412008
基金项目: 国家自然科学基金项目(61350003);辽宁省教育厅高等学校基本科研项目重点攻关项目(LJKZZ20220047).
详细信息
    作者简介:

    黄晓珂,硕士研究生,主要研究方向为智能数学理论与应用。E-mail:2806153271@qq.com;

    刘海涛,副教授,博士,主要研究方向为模糊数学、因素空间理论。E-mail:haitao641@163.com;

    汪培庄,教授,博士生导师,中国人工智能学会会士。主要研究方向为模糊数学、因素空间理论。发表学术论文113篇。E-mail:peizhuangw@126.com.

    通讯作者:

    刘海涛. E-mail:haitao641@163.com.

  • 中图分类号: TP391.4

The restricted Boltzmann machine fuses picture fuzzy information

  • 摘要: 为了解决受限玻尔兹曼机表示能力不足的问题,提出融合图模糊信息的受限玻尔兹曼机模型。首先将限制经典受限玻尔兹曼机学习能力的精确值参数,扩展为可以对信息进行多维度刻画的图模糊数。其次结合精确度函数的思想对图模糊自由能量函数去模糊化,进而构建了新的优化目标及学习算法。最后,基于多个基准数据集上的多角度对比分析,证明了新模型可以有效地提升经典模型及多种扩展模型的表示能力与泛化能力。

     

    Abstract: To solve the problem of insufficient representation ability of the restricted Boltzmann machine (RBM), a novel RBM model incorporating picture fuzzy information is proposed. First, the exact value parameter that limits the learning ability of the classical RBM is extended by the picture fuzzy numbers, which allow a multidimensional representation of information. Second, combined with the idea of precision function, the picture fuzzy free energy function is defuzzified, and then a new optimization target and learning algorithm are constructed. Finally, based on the multi-perspective comparative analysis using multiple benchmark datasets, it is demonstrated that the new model can effectively improve the representation and generalization capabilities of the classical model as well as various extended versions.

     

  • 受限玻尔兹曼机(restricted Boltzmann machine, RBM) [1] 作为一种生成式随机神经网络,是现代人工智能大模型、多模态大模型等生成式算法的理论基础[2]。同时,也是众多热点深度神经网络的基础模块[3],如深度玻尔兹曼机[4]、深度信念网络[5]和深度自编码器[6]等,广泛应用于图像识别与分类[7-8]、降维[9]、特征学习[10]和协同过滤[11]等领域。

    目前,针对RBM的研究主要分为两大类。第一类是针对模型结构[12]与快速训练算法的改进研究,如高斯受限玻尔兹曼机[13]、条件受限玻尔兹曼机[14]和卷积受限玻尔兹曼机[15]等变体改进,扩大了RBM的应用场景;对比散度[16-17]、吉布斯采样[18]等算法,有效地提高了RBM的训练速度。第二类是针对模型参数的扩展研究。如文献[19]认为,由于经典RBM可见单元和隐层单元之间的关系被限定为精确数,从而限制了其表示能力,导致其鲁棒性不强。针对此问题,将精确数扩展为模糊数构建了模糊受限玻尔兹曼机,有效地弥补了RBM刻画变量之间不确定的关系的不足,从而提高模型表达能力。文献[20]认为一型模糊集对模糊关系的刻画不够充分,进而提出了基于区间二型模糊集的RBM扩展模型,并应用在主题建模、过滤等领域。文献[21]又将参数扩展为非对称三角模糊数和高斯模糊数,进一步进行了扩展研究。上述扩展模型均对经典RBM的表示能力进行了有效的提升。但是,无论一型、二型模糊集,还是非对称三角及高斯模糊数,均只从隶属度一个维度对信息进行刻画,而图模糊数同时包括隶属度、非隶属度和中立度等多维度,对信息的刻画更加全面。故本文基于图模糊数对受限玻尔兹曼机进行扩展研究,提出了融合图模糊信息的受限玻尔兹曼机(picture fuzzy restricted Boltzmann machine, PicFRBM)。

    本文的主要安排如下:第一章对经典RBM、图模糊数的基础知识进行介绍;第二章提出PicFRBM模型及其学习算法;第三章在多个基准数据集上,同其他热点模型进行多角度的对比分析,验证新模型的有效性和优越性;第四章给出本文结论。

    RBM是一个两层对称的随机网络模型[22],包含可见层与隐层,神经元之间满足层内无连接、层间全连接,网络结构如图1所示。

    图  1  RBM网络结构
    Fig.  1  Structure of RBM network
    下载: 全尺寸图片

    图1${\boldsymbol{x}} = {\left[ {{x_1}\,{x_2}\, \cdot \cdot \cdot \,{x_n}} \right]^{\text{T}}}$,${\boldsymbol{h}} = {\left[ {{h_1}\,{h_2}\, \cdot \cdot \cdot \,{h_m}} \right]^{\text{T}}}$分别为可见层与隐藏层的状态向量,分量为1或0,表示激活或未激活两种状态;${\boldsymbol{\theta}} = \left\{ {{\boldsymbol{w}},{\boldsymbol{a}},{\boldsymbol{b}}} \right\}$为RBM的参数,${\boldsymbol{w}}$${\boldsymbol{a}}$${\boldsymbol{b}}$分别为可见层与隐藏层之间的权重矩阵、可见层及隐层的偏置向量。RBM的能量函数为

    $$ E\left( {{\boldsymbol x},{\boldsymbol h},{\boldsymbol \theta} } \right) = - \sum\limits_{i = 1}^n {{a_i}{x_i}} - \sum\limits_{j = 1}^m {{b_j}{h_j}} - \sum\limits_{i = 1}^n {\sum\limits_{j = 1}^m {{x_i}{w_{ij}}{h_j}} } $$ (1)

    按照该能量函数,RBM的联合概率与自由能量函数定义为

    $$ P\left( {{\boldsymbol x},{\boldsymbol h},{\boldsymbol \theta} } \right) = \frac{{{\text{exp}}\left( { - E\left( {{\boldsymbol x},{\boldsymbol h},{\boldsymbol \theta} } \right)} \right)}}{{\displaystyle\sum\limits_{\boldsymbol{x}} {\displaystyle\sum\limits_{\boldsymbol{h}} {{\text{exp}}\left( { - E\left( {{\boldsymbol x},{\boldsymbol h},{\boldsymbol \theta} } \right)} \right)} } }} $$ (2)
    $$ F\left( {{\boldsymbol{x}},{\boldsymbol \theta} } \right) = - \ln \sum\limits_{\boldsymbol{h}} {\exp \left( { - E\left( {{\boldsymbol x},{\boldsymbol h},{\boldsymbol \theta} } \right)} \right)} $$ (3)

    由上述定义推导出RBM边缘概率,即似然函数为

    $$ P\left( {{\boldsymbol{x}},{\boldsymbol \theta} } \right) = \frac{{\exp \left( { - F\left( {{\boldsymbol{x}},{\boldsymbol \theta} } \right)} \right)}}{{\displaystyle\sum\limits_x {\exp \left( { - F\left( {{\boldsymbol{x}},{\boldsymbol \theta} } \right)} \right)} }} $$ (4)

    当给定可见层(或隐层)上所有神经单元的状态时,隐层(或可见层)某一单元被激活的概率为

    $$ P\left( {\left. {{h_j} = 1} \right|{\boldsymbol{x}},{\boldsymbol \theta} } \right) = \sigma \left( {{b_j} + \sum\limits_{i = 1}^n {{w_{ij}}{x_i}} } \right) $$
    $$ P\left( {\left. {{x_i} = 1} \right|{\boldsymbol{h}},{\boldsymbol \theta} } \right) = \sigma \left( {{a_i} + \sum\limits_{j = 1}^m {{w_{ij}}{h_j}} } \right) $$

    式中$ \sigma (x) = {\text{sigmoid}}(x) = \dfrac{1}{{1 + \exp \left( { - x} \right)}} $

    给定训练集为$ {\boldsymbol{S}} = \left\{ {{{\boldsymbol{x}}^1},{{\boldsymbol{x}}^2}, \cdot \cdot \cdot ,{{\boldsymbol{x}}^n}} \right\} $ ($n$为训练样本数目),负对数似然函数构造为

    $$ L\left( {{\boldsymbol{S}},{\boldsymbol \theta} } \right) = - \ln \prod\limits_{t = 1}^n {P\left( {{{\boldsymbol{x}}^t},{\boldsymbol \theta} } \right)} = - \sum\limits_{t = 1}^n {\ln P\left( {{{\boldsymbol{x}}^t},{\boldsymbol \theta} } \right)} $$

    $L({\boldsymbol{S}},{\boldsymbol \theta} )$最小时,参数${\boldsymbol{\theta}} $得到最优解,通常采用随机梯度下降法进行求解。

    为了更准确地刻画模糊信息,在直觉模糊集的基础上,Cuong等[23]提出了图模糊集,定义如下。

    定义1[23] 设$X$为一个非空集合,则称

    $$ P = \left\{ {\left\langle {\left. {x,{\mu _P}(x),{\eta _P}(x),{\nu _P}(x),{\pi _P}(x)} \right|x \in X} \right\rangle } \right\} $$

    为图模糊集。其中,${\mu _P}\left( x \right)$, ${\eta _P}\left( x \right)$, ${\nu _P}\left( x \right) \in \left[ {0,1} \right]$分别为隶属度、中立度和非隶属度。对于$\forall x \in X$$ {\pi _P}\left( x \right) = 1 - {\mu _P}\left( x \right) - {\eta _P}\left( x \right) - {\nu _P}\left( x \right) $为弃权度。为方便,称$ \alpha=\left\{\alpha_{\mu},\alpha_{\eta},\alpha_{\nu}\right\} $为图模糊数。

    定义2[24] 设$ \alpha=\left\{\alpha_{\mu},\alpha_{\eta},\alpha_{\nu}\right\} $是一个图模糊数,它的得分函数$S\left( \alpha \right)$和精确度函数$H\left( \alpha \right)$分别为

    $$ S\left( \alpha \right) = {\alpha _\mu } + {\alpha _\eta } - {\alpha _\nu } $$
    $$ H\left( \alpha \right) = {\alpha _\mu } + {\alpha _\eta } + {\alpha _\nu } $$ (5)

    针对引言中提到的不足,为进一步提升RBM的表示能力,本文基于图模糊数对经典RBM进行扩展研究。首先,将经典RBM中的参数${\boldsymbol{\theta}} $扩展为$\tilde {\boldsymbol{\theta}} = \left\{ {\tilde {\boldsymbol{w}},\tilde {\boldsymbol{a}},\tilde {\boldsymbol{b}}} \right\}$。其中$ \tilde {\boldsymbol{w}} $$\tilde {\boldsymbol{a}}$$\tilde {\boldsymbol{b}}$均为图模糊数。

    结合式(1),可得PicFRBM的能量函数:

    $$ \tilde E\left( {{\boldsymbol{x}},{\boldsymbol{h}},\tilde {\boldsymbol{\theta}} } \right) = - \sum\limits_{i = 1}^n {{{\tilde a}_i}{x_i}} - \sum\limits_{j = 1}^m {{{\tilde b}_j}{h_j}} - \sum\limits_{i = 1}^m {\sum\limits_{j = 1}^n {{x_i}{{\tilde w}_{ij}}} } {h_j} $$

    同理,图模糊自由能函数可推导为

    $$ \tilde {\boldsymbol{F}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right) = - \ln \sum\limits_{\boldsymbol{h}} {\exp \left( { - E\left( {{\boldsymbol{x}},{\boldsymbol{h}},\tilde {\boldsymbol{\theta}} } \right)} \right)} $$

    当图模糊数不存在模糊性时,其能量函数、自由能函数均与RBM保持一致,RBM为PicFRBM的特例。如果将上述图模糊自由能量函数直接代入式(2)~ (4)中,将面临模糊概率和模糊最大似然的求解问题。但由于模糊函数的非线性和模糊运算的遍历性,上述求解过程难以实现。故这里采用式(5),即图模糊数精确度函数的思想,进行去模糊化,将该问题转化为经典最大似然问题。设去模糊化后的自由能量函数为$ {F_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right) $,将其定义为

    $$ {F_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right) = F\left( {{\boldsymbol{x}},{{\boldsymbol{\theta}} _\mu }} \right) + F\left( {{\boldsymbol{x}},{{\boldsymbol{\theta}} _\eta }} \right) + F\left( {{\boldsymbol{x}},{{\boldsymbol{\theta}} _\nu }} \right) $$ (6)

    式中:${{\boldsymbol{\theta}} _\mu } = \left\{ {{{\boldsymbol{w}}_\mu },{{\boldsymbol{a}}_\mu },{{\boldsymbol{b}}_\mu }} \right\}$${{\boldsymbol{\theta}} _\eta } = \left\{ {{{\boldsymbol{w}}_\eta },{{\boldsymbol{a}}_\eta },{{\boldsymbol{b}}_\eta }} \right\}$${{\boldsymbol{\theta}} _\nu } = \left\{ {{{\boldsymbol{w}}_\nu },{{\boldsymbol{a}}_\nu },{{\boldsymbol{b}}_\nu }} \right\}$

    结合式(4)、(6),可得似然函数:

    $$ {P_{\text{T}}}\left( {{\boldsymbol{x}}{\text{,}}\tilde {\boldsymbol{\theta}} } \right) = \;\frac{{\exp \left( { - {F_{\text{T}}}\left( {{\boldsymbol{x}}{\text{,}}\tilde {\boldsymbol{\theta}} } \right)} \right)}}{{\displaystyle\sum\limits_{\boldsymbol{x}} {\exp \left( { - {F_{\text{T}}}\left( {{\boldsymbol{x}}{\text{,}}\tilde {\boldsymbol{\theta}} } \right)} \right)} }} $$

    由此可得PicFRBM的目标函数:

    $$ L\left( {{\boldsymbol{S}},\tilde {\boldsymbol{\theta}} } \right) = - \ln \prod\limits_{t = 1}^n {{P_{\text{T}}}\left( {{{\boldsymbol{x}}^t},\tilde {\boldsymbol{\theta}} } \right)} = - \sum\limits_{t = 1}^n {\ln {P_{\text{T}}}\left( {{{\boldsymbol{x}}^t},\tilde {\boldsymbol{\theta}} } \right)} $$

    即通过求得最优参数$ \tilde {\boldsymbol{\theta}} $使得$ L\left( {{\boldsymbol{S}},\tilde {\boldsymbol{\theta}} } \right) $最小化。

    定理1 设${\boldsymbol{x}} \in {\boldsymbol{S}}$,则负对数似然函数$ - \ln {P_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)$关于参数${\boldsymbol{\theta}} $的导数为

    $$ - \frac{{\partial \ln {P_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial {\boldsymbol{\theta}} }} = \frac{{\partial {F_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial {\boldsymbol{\theta}} }} - {E_{{P_{\text{T}}}}}\left[ {\frac{{\partial {F_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial {\boldsymbol{\theta}} }}} \right] $$

    式中${\boldsymbol{\theta}} \in \left\{ {{{\boldsymbol{\theta}} _\mu },{{\boldsymbol{\theta}} _\eta },{{\boldsymbol{\theta}} _\nu }} \right\}$

    证明

    $$ \begin{gathered} -\frac{\partial \ln P_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial \boldsymbol{\theta}}=\frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial \boldsymbol{\theta}}-\left[\sum_{\boldsymbol{x}} \exp \left(-F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})\right)\right]^{-1} \times \\ \sum_{\boldsymbol{x}}\left[\exp \left(-F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})\right) \frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial {\boldsymbol{\theta}}}\right]= \\ \frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial \boldsymbol{\theta}}-\sum_{\boldsymbol{x}} \frac{\exp \left(-F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})\right)}{\displaystyle\sum_{\boldsymbol{x}} \exp \left(-F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})\right)} \frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial \boldsymbol{\theta}}= \\ \frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial \boldsymbol{\theta}}-E_{P_{\mathrm{T}}}\left[\frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial \boldsymbol{\theta}}\right] \end{gathered} $$

    推论1 设${\boldsymbol{\theta}} = {{\boldsymbol{\theta}} _\mu }$,则$ - \ln {P_{\text{T}}}({\boldsymbol{x}},\tilde {\boldsymbol{\theta}} )$关于参数$w_{ij}^\mu $, $a_i^\mu $, $b_j^\mu $的导数分别为

    $$ \left\{ \begin{gathered} -\dfrac{\partial \ln P_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial w_{i j}^\mu}=E_{P_{\mathrm{T}}}\left[P_\mu\left(h_j=1 \mid \boldsymbol{x}\right) x_i\right]- \\ P_\mu\left(h_j=1 \mid \boldsymbol{x}\right) x_i \\ -\dfrac{\partial \ln P_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial a_i^\mu}=E_{P_{\mathrm{T}}}\left[x_i\right]-x_i \\ -\dfrac{\partial \ln P_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial b_j^\mu}=E_{P_{\mathrm{T}}}\left[P_\mu\left(h_j=1 \mid \boldsymbol{x}\right)\right]-P_\mu\left(h_j=1 \mid \boldsymbol{x}\right) \end{gathered}\right. $$ (7)

    证明 由于

    $$ \begin{gathered} \frac{\partial F_{\mathrm{T}}(\boldsymbol{x}, \tilde{\boldsymbol{\theta}})}{\partial w_{i j}^\mu}=\frac{\partial\left[F\left(\boldsymbol{x}, \boldsymbol{\theta}_\mu\right)+F\left(\boldsymbol{x}, \boldsymbol{\theta}_\eta\right)+F\left(\boldsymbol{x}, \boldsymbol{\theta}_v\right)\right]}{\partial w_{i j}^\mu}= \\ \frac{\partial F\left(\boldsymbol{x}, \boldsymbol{\theta}_\mu\right)}{\partial w_{i j}^\mu}=\frac{\partial\left[-\ln \displaystyle\sum_h \exp \left(-E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)\right)\right]}{\partial w_{i j}^\mu}= \\ \frac{1}{\displaystyle\sum_{\boldsymbol{h}} \exp \left(-E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)\right)} \displaystyle\sum_{\boldsymbol{h}}\\ \left[\exp \left(-E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)\right) \frac{\partial E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)}{\partial w_{i j}^\mu}\right]= \\ \displaystyle\sum_{\boldsymbol{h}} \frac{\exp \left(-E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)\right)}{\displaystyle\sum_{\boldsymbol{h}} \exp \left(-E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)\right)} \frac{\partial E\left(\boldsymbol{x}, \boldsymbol{h}, \boldsymbol{\theta}_\mu\right)}{\partial w_{i j}^\mu}= \\ -\displaystyle\sum_{\boldsymbol{h}} P\left(\boldsymbol{h} \mid \boldsymbol{x}, \boldsymbol{\theta}_\mu\right) h_j x_i=\\ -P\left(h_j=1 \mid \boldsymbol{x}, \boldsymbol{\theta}_\mu\right) x_i=-P_\mu\left(h_j=1 \mid \boldsymbol{x}\right) x_i \end{gathered} $$

    $ - \ln {P_{\text{T}}}({\boldsymbol{x}},\tilde {\boldsymbol{\theta}} )$关于参数$w_{ij}^\mu $的导数为

    $$ \begin{gathered} - \frac{{\partial \ln {P_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial w_{ij}^\mu }} = \frac{{\partial {F_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial w_{ij}^\mu }} - {E_{{P_{\text{T}}}}}\left[ {\frac{{\partial {F_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial w_{ij}^\mu }}} \right] = \\ {E_{{P_{\text{T}}}}}\left[ {{P_\mu }\left( {\left. {{h_j} = 1} \right|{\boldsymbol{x}}} \right){x_i}} \right] - {P_\mu }\left( {\left. {{h_j} = 1} \right|{\boldsymbol{x}}} \right){x_i} \end{gathered} $$

    类似地,可以推出式(7)的第二、三条结论。当${\boldsymbol{\theta}} = {{\boldsymbol{\theta}} _\eta }$${\boldsymbol{\theta}} = {{\boldsymbol{\theta}} _\nu }$时,对参数求导过程同上。

    上述模型中期望的计算复杂度为$O\left( {{2^{m + n}}} \right)$。Hinton等[16]提出高效训练概率模型的对比发散算法(contrastive divergence, CD),用于衡量近似分布与真实分布之间的差异,给出对数似然梯度的近似值,变分论证可以为学习过程的收敛性提供理论证明[17]。故采用CD算法近似求解期望。大量的实验表明,当$k = 1$时,近似效果很好[18],式(7)可近似为

    $$ \left\{ \begin{gathered} - \frac{{\partial \ln {P_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial w_{ij}^\mu }} \approx {P_\mu }\left( {\left. {{h_j} = 1} \right|{{\boldsymbol{x}}^{\left( 1 \right)}}} \right)x_i^{\left( 1 \right)} - \\ {P_\mu }\left( {\left. {{h_j} = 1} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right)x_i^{\left( 0 \right)} \\ - \frac{{\partial \ln {P_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial a_i^\mu }} \approx x_i^{\left( 1 \right)} - x_i^{\left( 0 \right)} \\ - \frac{{\partial \ln {P_{\text{T}}}\left( {{\boldsymbol{x}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial b_j^\mu }} \approx {P_\mu }\left( {\left. {{h_j} = 1} \right|{{\boldsymbol{x}}^{\left( 1 \right)}}} \right) - {P_\mu }\left( {\left. {{h_j} = 1} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right) \\ \end{gathered} \right. $$ (8)

    由式(8)可得目标函数$ L\left( {{\boldsymbol{S}},\tilde {\boldsymbol{\theta}} } \right) $关于${{\boldsymbol{\theta}} _\mu }$的导数:

    $$ \left\{ \begin{gathered} \frac{{\partial L\left( {{\boldsymbol{S}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial w_{ij}^\mu }} = - \sum\limits_{t = 1}^n {\frac{{\partial \ln {P_{\text{T}}}\left( {{{\boldsymbol{x}}^t},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial w_{ij}^\mu }}} \\ \frac{{\partial L\left( {{\boldsymbol{S}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial a_i^\mu }} = - \sum\limits_{t = 1}^n {\frac{{\partial \ln {P_{\text{T}}}\left( {{{\boldsymbol{x}}^t},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial a_i^\mu }}} \\ \frac{{\partial L\left( {{\boldsymbol{S}},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial b_j^\mu }} = - \sum\limits_{t = 1}^n {\frac{{\partial \ln {P_{\text{T}}}\left( {{{\boldsymbol{x}}^t},\tilde {\boldsymbol{\theta}} } \right)}}{{\partial b_j^\mu }}} \\ \end{gathered} \right. $$

    PicFRBM学习算法的伪代码如算法1所示。

    算法1 PicFRBM学习算法的伪代码

    输入 训练样本${{\boldsymbol{x}}^{\left( 0 \right)}}$,学习率$ \varepsilon $,连接权重$ w_{ij}^\mu $$w_{ij}^\eta $$w_{ij}^\nu $,可见层偏置$a_i^\mu $$a_i^\eta $$a_i^\nu $,隐层偏置$b_j^\mu $$b_j^\eta $$b_j^\nu $

    输出 更新后的参数$ w_{ij}^\mu $, $w_{ij}^\eta $, $w_{ij}^\nu $, $a_i^\mu $, $a_i^\eta $, $a_i^\nu $, $b_j^\mu $, $b_j^\eta $, $b_j^\nu $

    1) For $j = 1:m$ do

    2)计算$ {P_\mu }\left( {\left. {h_j^{\mu \left( 0 \right)} = 1} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right) $, $ {P_\eta }\left( {\left. {h_j^{\eta \left( 0 \right)} = 1} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right) $, $ {P_\nu } \left( {\left. {h_j^{\nu \left( 0 \right)} = 1} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right) $

    3) $h_j^{\mu \left( 0 \right)} \in \left\{ {0,1} \right\} \sim {P_\mu }\left( {\left. {h_j^{\mu \left( 0 \right)}} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right)$; $h_j^{\eta \left( 0 \right)} \in r \left\{ {0, 1} \right\} \sim {P_\eta } \left( {\left. {h_j^{\eta \left( 0 \right)}} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right)$; $ h_j^{\nu \left( 0 \right)} \in \left\{ {0,1} \right\} \sim {P_\nu } \left( {\left. {h_j^{\nu \left( 0 \right)}} \right|{{\boldsymbol{x}}^{\left( 0 \right)}}} \right) $

    4) End for

    5) For $i = 1:n$ do

    6)计算$ {P_\mu }\left( {\left. {x_i^{\mu \left( 1 \right)} = 1} \right|{{\boldsymbol{h}}^{\mu \left( 0 \right)}}} \right) $, $ {P_\eta }\left( {\left. {x_i^{\eta \left( 1 \right)} = 1} \right|{{\boldsymbol{h}}^{\eta \left( 0 \right)}}} \right) $, $ {P_\nu } \left( {\left. {x_i^{\nu \left( 1 \right)} = 1} \right|{{\boldsymbol{h}}^{\nu \left( 0 \right)}}} \right) $

    7)采样$x_i^{\mu \left( 1 \right)} \in \left\{ {0,1} \right\} \sim {P_\mu }\left( {\left. {x_i^{\mu \left( 1 \right)}} \right|{{\boldsymbol{h}}^{\mu \left( 0 \right)}}} \right)$; $x_i^{\eta \left( 1 \right)} \in \left\{ {0,1} \right\} \sim {P_\mu }\left( {\left. {x_i^{\eta \left( 1 \right)}} \right|{{\boldsymbol{h}}^{\eta \left( 0 \right)}}} \right)$; $x_i^{\nu \left( 1 \right)} \in \left\{ {0,1} \right\} \sim {P_\mu }\left( {\left. {x_i^{\nu \left( 1 \right)}} \right|{{\boldsymbol{h}}^{\nu \left( 0 \right)}}} \right)$

    8) End for

    9) For $j = 1:m$ do

    10)计算$ {P_\mu }\left( {\left. {h_j^{\mu \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\mu \left( 1 \right)}}} \right) $, $ {P_\eta }\left( {\left. {h_j^{\eta \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\eta \left( 1 \right)}}} \right) $, $ {P_\nu } \left( {\left. {h_j^{\nu \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\nu \left( 1 \right)}}} \right) $

    11)采样$h_j^{\mu \left( 1 \right)} \in \left\{ {0,1} \right\} \sim {P_\mu }\left( {\left. {h_j^{\mu \left( 1 \right)}} \right|{{\boldsymbol{x}}^{\mu \left( 1 \right)}}} \right)$; $h_j^{\eta \left( 1 \right)} \in \left\{ {0,1} \right\} \sim {P_\eta }\left( {\left. {h_j^{\eta \left( 1 \right)}} \right|{{\boldsymbol{x}}^{\eta \left( 1 \right)}}} \right)$; $h_j^{\nu \left( 1 \right)} \in \left\{ {0,1} \right\} \sim {P_\nu }\left( {\left. {h_j^{\nu \left( 1 \right)}} \right|{{\boldsymbol{x}}^{\nu \left( 1 \right)}}} \right)$

    12) End for

    13)$a_i^\mu = a_i^\mu + \varepsilon \left( {{{\boldsymbol{x}}^{\left( 0 \right)}} - {{\boldsymbol{x}}^{\mu \left( 1 \right)}}} \right)$; $b_j^\mu = b_j^\mu + \varepsilon \left( {P_\mu } \left( \left. h_j^{\mu \left( 0 \right)} = 1 \right|{{\boldsymbol{x}}^{\left( 0 \right)}} \right) - {P_\mu }\left( {\left. {h_j^{\mu \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\mu \left( 1 \right)}}} \right) \right)$; $w_{ij}^\mu = w_{ij}^\mu + \varepsilon \left( {{\boldsymbol{x}}^{\left( 0 \right)}}{P_\mu }\left( \left. h_j^{\mu \left( 0 \right)} = 1 \right|{{\boldsymbol{x}}^{\left( 0 \right)}} \right) - {{\boldsymbol{x}}^{\mu \left( 1 \right)}}{P_\mu }\left( {\left. {h_j^{\mu \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\mu \left( 1 \right)}}} \right) \right)$

    14)$a_i^\eta = a_i^\eta + \varepsilon \left( {{{\boldsymbol{x}}^{\left( 0 \right)}} - {{\boldsymbol{x}}^{\eta \left( 1 \right)}}} \right)$; $b_j^\eta = b_j^\eta + \varepsilon \left( {P_\eta }\left( \left. h_j^{\eta \left( 0 \right)} = 1 \right|{{\boldsymbol{x}}^{\left( 0 \right)}} \right) - {P_\eta }\left( {\left. {h_j^{\eta \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\eta \left( 1 \right)}}} \right) \right)$; $w_{ij}^\eta = w_{ij}^\eta + \varepsilon \left( {{\boldsymbol{x}}^{\left( 0 \right)}}{P_\eta }\left( \left. h_j^{\eta \left( 0 \right)} = 1 \right|{{\boldsymbol{x}}^{\left( 0 \right)}} \right) - {{\boldsymbol{x}}^{\eta \left( 1 \right)}}{P_\eta }\left( {\left. {h_j^{\eta \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\eta \left( 1 \right)}}} \right) \right)$

    15)$a_i^\nu = a_i^\nu + \varepsilon \left( {{{\boldsymbol{x}}^{\left( 0 \right)}} - {{\boldsymbol{x}}^{\nu \left( 1 \right)}}} \right)$; $b_j^\nu = b_j^\nu + \varepsilon \left( {P_\nu }\left( \left. h_j^{\nu \left( 0 \right)} = 1 \right|{{\boldsymbol{x}}^{\left( 0 \right)}} \right) - {P_\nu }\left( {\left. {h_j^{\nu \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\nu \left( 1 \right)}}} \right) \right)$; $w_{ij}^\nu = w_{ij}^\nu + \varepsilon \left( {{\boldsymbol{x}}^{\left( 0 \right)}}{P_\nu }\left( \left. h_j^{\nu \left( 0 \right)} = 1 \right|{{\boldsymbol{x}}^{\left( 0 \right)}} \right) - {{\boldsymbol{x}}^{\nu \left( 1 \right)}}{P_\nu }\left( {\left. {h_j^{\nu \left( 1 \right)} = 1} \right|{{\boldsymbol{x}}^{\nu \left( 1 \right)}}} \right) \right)$

    16)返回$ w_{ij}^\mu $, $w_{ij}^\eta $, $w_{ij}^\nu $, $a_i^\mu $, $a_i^\eta $, $a_i^\nu $, $b_j^\mu $, $b_j^\eta $, $ b_j^\nu $

    为检验PicFRBM的效果,将其与经典RBM及另外5种扩展模型在多个基准数据集上进行多角度的对比分析。表1给出了各个数据集的统计特征,其中样本数量最少的为60 000个,最多的为280 000个,特征个数均为784,类别个数的区间为10~37,应用领域包括数字、字母及服饰的识别问题,所选数据集的类型较为丰富。参与对比实验的模型包括经典RBM,基于一型模糊集的扩展模型FRBM (fuzzy restricted Boltzmann machine),基于对称和非对称三角模糊数的扩展模型FRBM-STFN(FRBM with symmetric triangular fuzzy numbers)、FRBM-ATFN[21](FRBM with asymmetric triangular fuzzy numbers),区间二型模糊受限玻尔兹曼机[20](interval type-2 fuzzy restricted Boltzmann machine, IT2FRBM),模糊去冗余受限玻尔兹曼机[25](fuzzy removing redundancy restricted Boltzmann machine, F3RBM)。为保证公平,所有实验均在Windows 10(64位)操作系统下进行,内存为32 GB,显存为1 981 MB,CPU工作频率为2.50 GHz。

    表  1  实验数据集
    Table  1  Datasets for experiment
    数据集训练集测试集特征数类别数
    MNIST[26]60 00010 00078410
    FashionMNIST[27]60 00010 00078410
    EMNIST_letters[28]88 80014 80078437
    EMNIST_digits[28]240 00040 00078410
    Kuzushiji-MNIST[29]50 00010 00078410
    notMNIST[30]60 00010 00078410
    Cifar_10[31]50 00010 0003 07210
    Persian HD[32]100 00050 00078410
    Devanagari[33]17 0003 00078410
    3.2.1   重构效果对比

    将重构后的样本数据和原始样本数据之间的偏差称作重构误差[34-36],它可以反映模型的学习能力,因此以其为标准对各种模型进行评估。首先,利用每个数据集的训练集进行无监督训练,对比各种模型的重构误差。同时,为避免随机性的影响,将每种模型在每个数据集下运行10次取平均值作为实验结果。隐层单元个数分别为1 000、800、500、300,结果如表2所示。

    表  2  不同隐层单元下各模型的平均重构误差
    Table  2  Average reconstruction errors of each model under different hidden units
    隐层单元数 数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    1000 MNIST 61.36±0.04 51.42±0.03 50.61±0.03 71.96±0.97 52.32±0.31 51.06±0.03 47.05±0.03
    FaMNIST 79.19±0.09 72.43±0.13 72.63±0.10 92.73±5.43 77.65±1.15 72.82±0.11 66.66±0.06
    KMNIST 88.20±0.09 77.94±0.34 73.13±0.12 83.05±0.21 79.81±2.48 73.45±0.06 66.98±0.04
    notMNIST 89.54±0.19 78.65±0.34 74.07±0.07 90.30±0.83 94.35±0.92 73.99±0.06 63.72±0.05
    1000 Cifar_10 150.55±0.08 128.13±0.06 196.15±0.55 240.49±4.61 223.81±3.59 195.22±0.60 132.7±0.02
    Persian HD 37.72±0.01 40.29±0.05 45.44±0.11 53.03±1.12 40.58±0.07 45.35±0.12 35.09±0.02
    Devanagari 43.86±0.14 54.48±0.15 36.82±0.11 43.76±2.15 68.35±0.12 36.90±0.12 30.39±0.03
    800 MNIST 61.61±0.03 51.37±0.02 50.69±0.03 67.15±1.68 51.78±0.09 50.85±0.02 47.36±0.03
    FaMNIST 79.33±0.08 72.60±0.09 72.62±0.18 84.08±2.81 76.69±0.48 72.85±0.18 67.13±0.06
    KMNIST 89.13±0.06 78.17±0.34 74.09±0.07 83.69±0.41 79.53±0.06 74.36±0.06 68.01±0.04
    notMNIST 91.08±0.13 80.55±0.64 76.04±0.04 86.30±0.30 97.11±0.14 77.78±0.13 65.93±0.06
    Cifar_10 149.51±0.07 129.25±0.01 195.64±0.32 241.43±4.30 223.97±2.04 196.00±0.22 135.6±0.04
    Persian HD 39.39±0.02 40.82±0.06 44.16±0.12 55.13±1.54 40.99±0.08 44.11±0.11 37.14±0.03
    Devanagari 44.33±0.16 56.62±0.23 37.16±0.13 47.11±1.25 76.89±0.30 37.36±0.14 31.63±0.02
    500 MNIST 62.72±0.03 51.81±0.06 51.83±0.06 71.84±1.00 51.94±0.05 51.97±0.07 48.38±0.03
    FaMNIST 79.65±0.09 71.58±0.12 73.83±0.14 81.43±1.04 79.85±0.25 73.87±0.15 68.08±0.04
    KMNIST 93.02±0.11 80.09±0.43 78.07±0.08 98.04±1.50 79.33±0.19 78.03±0.09 72.46±0.02
    notMNIST 96.04±0.19 85.01±0.62 83.63±0.16 96.77±0.78 110.25±0.68 84.14±0.12 74.09±0.05
    Cifar_10 150.99±0.13 134.02±0.05 217.83±3.16 241.43±4.30 228.75±1.41 215.10±2.63 146.25±0.15
    Persian HD 44.53±0.05 44.16±0.08 44.45±0.02 54.07±0.74 44.10±0.14 44.48±0.03 42.71±0.04
    Devanagari 46.48±0.16 59.28±0.20 39.59±0.08 50.69±0.39 68.65±1.67 39.45±0.09 36.21±0.04
    300 MNIST 64.75±0.07 53.05±0.03 54.95±0.07 79.35±0.91 57.67±1.08 55.06±0.08 51.87±0.06
    FaMNIST 81.79±0.09 73.72±0.05 75.89±0.06 97.19±1.33 77.05±0.02 75.96±0.07 71.21±0.05
    KMNIST 99.8±0.11 85.17±0.13 87.90±0.16 123.90±1.54 87.02±0.52 87.85±0.16 83.70±0.08
    notMNIST 103.32±0.12 90.72±0.37 96.39±0.23 107.91±1.10 139.48±0.51 96.99±0.19 89.15±0.24
    Cifar_10 158.68±0.15 143.80±0.12 233.71±1.03 239.14±5.17 248.06±1.95 234.96±1.29 169.52±0.12
    Persian HD 51.52±0.04 47.64±0.07 49.94±0.06 55.92±0.09 47.59±0.01 49.88±0.07 50.47±0.04
    Devanagari 50.94±0.14 54.86±0.15 46.51±0.11 239.14±5.17 85.24±0.56 46.71±0.09 44.69±0.04
    注:加粗表示最优结果。

    图2以一个样本为例,给出了不同模型的重构效果,其中图2(a)为原图,图2(b)~(h)分别为经典RBM及其扩展模型的重构效果图,对比上述8张图片,PicFRBM的重构图边缘最清晰,整体重构效果最接近原图,说明其重构能力优于经典RBM及其他扩展模型。

    图  2  各模型重构效果
    Fig.  2  Reconstruction result of each algorithm
    下载: 全尺寸图片
    3.2.2   重构效率对比

    图3给出了7种模型在不同数据集上,经过20轮的训练与学习,重构误差的减小过程(以隐层单元数800为例)。观察该图可知,由于随机性的影响,PicFRBM在迭代初期重构误差较大,但经过6轮左右的迭代,误差会迅速减小,低于其他6种模型,该结果说明PicFRBM的重构效率明显优于经典RBM及另外5种模型。

    图  3  平均重构误差下降过程对比
    Fig.  3  Comparison of the decline of average reconstruction error
    下载: 全尺寸图片

    为进一步比较各种模型的泛化性能,将它们分别与支持向量机(support vector machines, SVM)结合,检验其在测试集上的泛化能力。首先利用每个数据集的训练集部分,对RBM等5种模型进行无监督训练;然后将其分别与SVM结合,再进行微调;最后在测试集上比较它们的泛化性能。为保证公平性,SVM的核函数统一设置为高斯核函数,惩罚参数$ c = 10 $,核函数参数$g = 0.05$。5种模型的隐层单元数统一设置为300,利用十折交叉验证法得到实验结果。采用准确率(accuracy)、精确率(precision)、召回率(recall)、F1值(F1-score) 4个评价标准对泛化能力进行评价,具体结果见表3~6

    表  3  7种模型在8个数据集下的准确率
    Table  3  Accuracy of seven algorithms on eight datasets %
    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 99.58±0.01 99.66±0.01 99.70±0.01 99.62±0.02 99.63±0.02 99.69±0.09 99.71±0.01
    FaMNIST 99.37±0.02 99.50±0.01 99.54±0.01 99.32±0.05 99.13±0.03 99.53±0.14 99.63±0.02
    EMNIST-letter 99.22±0.01 99.30±0.01 99.34±0.01 99.35±0.00 99.31±0.09 99.33±0.05 99.36±0.01
    EMNIST-digits 99.76±0.00 99.80±0.00 99.82±0.01 99.81±0.01 99.69±0.05 99.81±0.04 99.84±0.00
    KMNIST 97.48±0.05 98.06±0.05 98.21±0.02 97.90±0.10 98.13±0.02 98.18±0.26 98.31±0.03
    Not-MNIST 98.68±0.02 98.79±0.01 98.84±0.02 98.89±0.02 96.69±0.08 98.82±0.15 98.94±0.01
    Persian HD 99.95±0.00 99.96±0.00 99.97±0.00 99.81±0.00 99.96±0.01 99.96±0.00 99.97±0.00
    Devanagari 99.74±0.01 99.76±0.03 99.81±0.02 99.82±0.01 99.53±0.02 99.77±0.01 99.83±0.02
    注:加粗表示最优结果。
    表  4  7种模型在8个数据集下的精确率
    Table  4  Precision of seven algorithms on eight datasets %
    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 97.91±0.10 98.29±0.06 98.50±0.05 98.09±0.11 98.12±0.10 98.45±0.05 98.52±0.06
    FaMNIST 96.81±0.11 97.48±0.10 97.69±0.06 96.55±0.29 95.62±0.12 97.67±0.07 98.15±0.11
    EMNIST-letter 89.97±0.13 90.99±0.10 91.39±0.10 91.65±0.06 91.10±0.01 91.54±0.07 91.73±0.07
    EMNuIST-digits 98.81±0.01 99.00±0.02 99.11±0.02 99.07±0.04 98.47±0.02 99.11±0.02 99.18±0.01
    KMNIST 87.63±0.26 90.42±0.27 91.12±0.11 89.61±0.50 90.25±0.10 91.03±0.10 91.66±0.16
    Not-MNIST 93.41±0.13 93.99±0.05 94.22±0.11 94.46±0.14 84.19±0.02 94.11±0.07 94.73±0.08
    Persian HD 99.77±0.16 99.81±0.17 99.84±0.01 99.81±0.13 99.82±0.01 99.82±0.004 99.83±0.11
    Devanagari 98.71±0.06 98.83±0.16 99.07±0.10 99.13±0.07 98.27±0.01 98.87±0.02 99.16±0.05
    注:加粗表示最优结果。
    表  5  7种模型在8个数据集下的召回率
    Table  5  Recall of seven algorithms on eight datasets %
    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 97.90±0.09 98.28±0.06 98.49±0.05 98.08±0.12 98.11±0.10 98.43±0.05 98.51±0.06
    FaMNIST 96.80±0.11 97.47±0.10 97.67±0.06 96.53±0.29 95.60±0.12 97.66±0.08 98.14±0.11
    EMNIST-letter 89.91±0.13 90.94±0.10 91.36±0.10 91.61±0.06 91.04±0.01 91.50±0.07 91.70±0.07
    EMNIST-digits 98.80±0.01 99.00±0.02 99.11±0.02 99.06±0.04 98.47±0.02 99.07±0.02 99.18±0.01
    KMNIST 87.42±0.26 90.30±0.27 91.03±0.12 89.51±0.50 90.14±0.10 90.94±0.13 91.57±0.16
    Not-MNIST 93.38±0.13 93.97±0.05 94.19±0.12 94.44±0.14 83.47±0.03 94.10±0.07 94.71±0.08
    Persian HD 99.77±0.16 99.81±0.17 99.84±0.01 99.81±0.13 99.82±0.01 99.82±0.04 99.83±0.11
    Devanagari 98.71±0.06 98.82±0.16 99.06±0.10 99.13±0.07 98.26±0.01 99.87±0.02 99.16±0.06
    注:加粗表示最优结果。
    表  6  7种模型在8个数据集下的F1值
    Table  6  F1 scores of seven algorithms on eight datasets %
    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 97.91±0.10 98.29±0.06 98.50±0.05 98.08±0.12 98.12±0.10 98.43±0.05 98.52±0.06
    FaMNIST 96.80±0.11 97.48±0.10 97.68±0.06 96.54±0.29 95.60±0.12 97.66±0.07 98.14±0.11
    EMNIST-letter 89.92±0.13 90.96±0.10 91.37±0.10 91.62±0.06 91.06±0.01 91.51±0.07 91.71±0.07
    EMNIST-digits 98.80±0.01 99.00±0.02 99.11±0.02 99.06±0.04 98.47±0.02 99.07±0.02 99.18±0.01
    KMNIST 87.44±0.26 90.31±0.27 91.04±0.12 89.52±0.50 90.15±0.09 90.95±0.12 91.57±0.16
    Not-MNIST 93.36±0.13 93.97±0.05 94.20±0.11 94.44±0.14 83.61±0.31 94.10±0.07 94.72±0.08
    Persian HD 99.77±0.16 99.81±0.17 99.84±0.01 99.81±0.13 99.82±0.01 99.82±0.04 99.83±0.11
    Devanagari 98.71±0.06 98.82±0.16 99.06±0.10 99.13±0.07 98.26±0.01 99.87±0.02 99.16±0.06
    注:加粗表示最优结果。

    对比表3~6的第2列和第8列可知,PicFRBM相比经典RBM,泛化能力有了显著的提升,在精确率上最高提升0.83百分点,F1值上的提升最高达4.13百分点。同时,对比第8列与其余5列,准确率也有较大提升,最大提升了2.13百分点;F1值提升效果明显,最高提升11.11百分点。上述结果表明 PicFRBM的泛化性能优于经典RBM及其他扩展模型。

    针对受限玻尔兹曼机表示能力不足的问题,本文结合图模糊数多维度表示信息的优点,对经典RBM进行扩展研究,该模型将RBM中的精确参数扩展为图模糊数,进而提出图模糊自由能函数,并结合精确度函数的思想对其进行去模糊化,最终得到融合图模糊信息的受限玻尔兹曼机模型及其学习算法。在多个基准数据集上,对比了PicFRBM与经典RBM及其他5种扩展模型的重构误差和平均重构误差下降过程,证明了PicFRBM的重构效果与重构效率均优于其他模型。将各模型与SVM结合,对比了它们的泛化性能,实验结果表明PicFRBM-SVM在准确率、F1值等评价准则下基本都优于其他扩展模型,证明了PicFRBM具备更强的泛化性能。未来考虑将PicFRBM作为基础框架应用于其他深度学习模型,以进一步提高深度模型的性能。

  • 图  1   RBM网络结构

    Fig.  1   Structure of RBM network

    下载: 全尺寸图片

    图  2   各模型重构效果

    Fig.  2   Reconstruction result of each algorithm

    下载: 全尺寸图片

    图  3   平均重构误差下降过程对比

    Fig.  3   Comparison of the decline of average reconstruction error

    下载: 全尺寸图片

    表  1   实验数据集

    Table  1   Datasets for experiment

    数据集训练集测试集特征数类别数
    MNIST[26]60 00010 00078410
    FashionMNIST[27]60 00010 00078410
    EMNIST_letters[28]88 80014 80078437
    EMNIST_digits[28]240 00040 00078410
    Kuzushiji-MNIST[29]50 00010 00078410
    notMNIST[30]60 00010 00078410
    Cifar_10[31]50 00010 0003 07210
    Persian HD[32]100 00050 00078410
    Devanagari[33]17 0003 00078410

    表  2   不同隐层单元下各模型的平均重构误差

    Table  2   Average reconstruction errors of each model under different hidden units

    隐层单元数 数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    1000 MNIST 61.36±0.04 51.42±0.03 50.61±0.03 71.96±0.97 52.32±0.31 51.06±0.03 47.05±0.03
    FaMNIST 79.19±0.09 72.43±0.13 72.63±0.10 92.73±5.43 77.65±1.15 72.82±0.11 66.66±0.06
    KMNIST 88.20±0.09 77.94±0.34 73.13±0.12 83.05±0.21 79.81±2.48 73.45±0.06 66.98±0.04
    notMNIST 89.54±0.19 78.65±0.34 74.07±0.07 90.30±0.83 94.35±0.92 73.99±0.06 63.72±0.05
    1000 Cifar_10 150.55±0.08 128.13±0.06 196.15±0.55 240.49±4.61 223.81±3.59 195.22±0.60 132.7±0.02
    Persian HD 37.72±0.01 40.29±0.05 45.44±0.11 53.03±1.12 40.58±0.07 45.35±0.12 35.09±0.02
    Devanagari 43.86±0.14 54.48±0.15 36.82±0.11 43.76±2.15 68.35±0.12 36.90±0.12 30.39±0.03
    800 MNIST 61.61±0.03 51.37±0.02 50.69±0.03 67.15±1.68 51.78±0.09 50.85±0.02 47.36±0.03
    FaMNIST 79.33±0.08 72.60±0.09 72.62±0.18 84.08±2.81 76.69±0.48 72.85±0.18 67.13±0.06
    KMNIST 89.13±0.06 78.17±0.34 74.09±0.07 83.69±0.41 79.53±0.06 74.36±0.06 68.01±0.04
    notMNIST 91.08±0.13 80.55±0.64 76.04±0.04 86.30±0.30 97.11±0.14 77.78±0.13 65.93±0.06
    Cifar_10 149.51±0.07 129.25±0.01 195.64±0.32 241.43±4.30 223.97±2.04 196.00±0.22 135.6±0.04
    Persian HD 39.39±0.02 40.82±0.06 44.16±0.12 55.13±1.54 40.99±0.08 44.11±0.11 37.14±0.03
    Devanagari 44.33±0.16 56.62±0.23 37.16±0.13 47.11±1.25 76.89±0.30 37.36±0.14 31.63±0.02
    500 MNIST 62.72±0.03 51.81±0.06 51.83±0.06 71.84±1.00 51.94±0.05 51.97±0.07 48.38±0.03
    FaMNIST 79.65±0.09 71.58±0.12 73.83±0.14 81.43±1.04 79.85±0.25 73.87±0.15 68.08±0.04
    KMNIST 93.02±0.11 80.09±0.43 78.07±0.08 98.04±1.50 79.33±0.19 78.03±0.09 72.46±0.02
    notMNIST 96.04±0.19 85.01±0.62 83.63±0.16 96.77±0.78 110.25±0.68 84.14±0.12 74.09±0.05
    Cifar_10 150.99±0.13 134.02±0.05 217.83±3.16 241.43±4.30 228.75±1.41 215.10±2.63 146.25±0.15
    Persian HD 44.53±0.05 44.16±0.08 44.45±0.02 54.07±0.74 44.10±0.14 44.48±0.03 42.71±0.04
    Devanagari 46.48±0.16 59.28±0.20 39.59±0.08 50.69±0.39 68.65±1.67 39.45±0.09 36.21±0.04
    300 MNIST 64.75±0.07 53.05±0.03 54.95±0.07 79.35±0.91 57.67±1.08 55.06±0.08 51.87±0.06
    FaMNIST 81.79±0.09 73.72±0.05 75.89±0.06 97.19±1.33 77.05±0.02 75.96±0.07 71.21±0.05
    KMNIST 99.8±0.11 85.17±0.13 87.90±0.16 123.90±1.54 87.02±0.52 87.85±0.16 83.70±0.08
    notMNIST 103.32±0.12 90.72±0.37 96.39±0.23 107.91±1.10 139.48±0.51 96.99±0.19 89.15±0.24
    Cifar_10 158.68±0.15 143.80±0.12 233.71±1.03 239.14±5.17 248.06±1.95 234.96±1.29 169.52±0.12
    Persian HD 51.52±0.04 47.64±0.07 49.94±0.06 55.92±0.09 47.59±0.01 49.88±0.07 50.47±0.04
    Devanagari 50.94±0.14 54.86±0.15 46.51±0.11 239.14±5.17 85.24±0.56 46.71±0.09 44.69±0.04
    注:加粗表示最优结果。

    表  3   7种模型在8个数据集下的准确率

    Table  3   Accuracy of seven algorithms on eight datasets %

    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 99.58±0.01 99.66±0.01 99.70±0.01 99.62±0.02 99.63±0.02 99.69±0.09 99.71±0.01
    FaMNIST 99.37±0.02 99.50±0.01 99.54±0.01 99.32±0.05 99.13±0.03 99.53±0.14 99.63±0.02
    EMNIST-letter 99.22±0.01 99.30±0.01 99.34±0.01 99.35±0.00 99.31±0.09 99.33±0.05 99.36±0.01
    EMNIST-digits 99.76±0.00 99.80±0.00 99.82±0.01 99.81±0.01 99.69±0.05 99.81±0.04 99.84±0.00
    KMNIST 97.48±0.05 98.06±0.05 98.21±0.02 97.90±0.10 98.13±0.02 98.18±0.26 98.31±0.03
    Not-MNIST 98.68±0.02 98.79±0.01 98.84±0.02 98.89±0.02 96.69±0.08 98.82±0.15 98.94±0.01
    Persian HD 99.95±0.00 99.96±0.00 99.97±0.00 99.81±0.00 99.96±0.01 99.96±0.00 99.97±0.00
    Devanagari 99.74±0.01 99.76±0.03 99.81±0.02 99.82±0.01 99.53±0.02 99.77±0.01 99.83±0.02
    注:加粗表示最优结果。

    表  4   7种模型在8个数据集下的精确率

    Table  4   Precision of seven algorithms on eight datasets %

    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 97.91±0.10 98.29±0.06 98.50±0.05 98.09±0.11 98.12±0.10 98.45±0.05 98.52±0.06
    FaMNIST 96.81±0.11 97.48±0.10 97.69±0.06 96.55±0.29 95.62±0.12 97.67±0.07 98.15±0.11
    EMNIST-letter 89.97±0.13 90.99±0.10 91.39±0.10 91.65±0.06 91.10±0.01 91.54±0.07 91.73±0.07
    EMNuIST-digits 98.81±0.01 99.00±0.02 99.11±0.02 99.07±0.04 98.47±0.02 99.11±0.02 99.18±0.01
    KMNIST 87.63±0.26 90.42±0.27 91.12±0.11 89.61±0.50 90.25±0.10 91.03±0.10 91.66±0.16
    Not-MNIST 93.41±0.13 93.99±0.05 94.22±0.11 94.46±0.14 84.19±0.02 94.11±0.07 94.73±0.08
    Persian HD 99.77±0.16 99.81±0.17 99.84±0.01 99.81±0.13 99.82±0.01 99.82±0.004 99.83±0.11
    Devanagari 98.71±0.06 98.83±0.16 99.07±0.10 99.13±0.07 98.27±0.01 98.87±0.02 99.16±0.05
    注:加粗表示最优结果。

    表  5   7种模型在8个数据集下的召回率

    Table  5   Recall of seven algorithms on eight datasets %

    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 97.90±0.09 98.28±0.06 98.49±0.05 98.08±0.12 98.11±0.10 98.43±0.05 98.51±0.06
    FaMNIST 96.80±0.11 97.47±0.10 97.67±0.06 96.53±0.29 95.60±0.12 97.66±0.08 98.14±0.11
    EMNIST-letter 89.91±0.13 90.94±0.10 91.36±0.10 91.61±0.06 91.04±0.01 91.50±0.07 91.70±0.07
    EMNIST-digits 98.80±0.01 99.00±0.02 99.11±0.02 99.06±0.04 98.47±0.02 99.07±0.02 99.18±0.01
    KMNIST 87.42±0.26 90.30±0.27 91.03±0.12 89.51±0.50 90.14±0.10 90.94±0.13 91.57±0.16
    Not-MNIST 93.38±0.13 93.97±0.05 94.19±0.12 94.44±0.14 83.47±0.03 94.10±0.07 94.71±0.08
    Persian HD 99.77±0.16 99.81±0.17 99.84±0.01 99.81±0.13 99.82±0.01 99.82±0.04 99.83±0.11
    Devanagari 98.71±0.06 98.82±0.16 99.06±0.10 99.13±0.07 98.26±0.01 99.87±0.02 99.16±0.06
    注:加粗表示最优结果。

    表  6   7种模型在8个数据集下的F1值

    Table  6   F1 scores of seven algorithms on eight datasets %

    数据集 RBM FRBM FRBM-STFN FRBM-ATFN F3RBM IT2FRBM PicFRBM
    MNIST 97.91±0.10 98.29±0.06 98.50±0.05 98.08±0.12 98.12±0.10 98.43±0.05 98.52±0.06
    FaMNIST 96.80±0.11 97.48±0.10 97.68±0.06 96.54±0.29 95.60±0.12 97.66±0.07 98.14±0.11
    EMNIST-letter 89.92±0.13 90.96±0.10 91.37±0.10 91.62±0.06 91.06±0.01 91.51±0.07 91.71±0.07
    EMNIST-digits 98.80±0.01 99.00±0.02 99.11±0.02 99.06±0.04 98.47±0.02 99.07±0.02 99.18±0.01
    KMNIST 87.44±0.26 90.31±0.27 91.04±0.12 89.52±0.50 90.15±0.09 90.95±0.12 91.57±0.16
    Not-MNIST 93.36±0.13 93.97±0.05 94.20±0.11 94.44±0.14 83.61±0.31 94.10±0.07 94.72±0.08
    Persian HD 99.77±0.16 99.81±0.17 99.84±0.01 99.81±0.13 99.82±0.01 99.82±0.04 99.83±0.11
    Devanagari 98.71±0.06 98.82±0.16 99.06±0.10 99.13±0.07 98.26±0.01 99.87±0.02 99.16±0.06
    注:加粗表示最优结果。
  • [1] 马世龙, 乌尼日其其格, 李小平. 大数据与深度学习综述[J]. 智能系统学报, 2016, 11(6): 728−742.

    MA Shilong, WUNIRI Qiqige, LI Xiaoping. Deep learning with big data: state of the art and development[J]. CAAI transactions on intelligent systems, 2016, 11(6): 728−742.
    [2] HINTON G E. On early inspiration[Z/OL]. (2024−10−09) [2024−10−20]. https://ting.eudic.net/webting/desktopplay?id=dc3354a2-85e1-11ef-8108005056866eda.
    [3] 胡铭菲, 左信, 刘建伟. 深度生成模型综述[J]. 自动化学报, 2022, 48(1): 40−74.

    HU Mingfei, ZUO Xin, LIU Jianwei. Survey on deep generative model[J]. Acta automatica sinica, 2022, 48(1): 40−74.
    [4] ALBERICI D, CONTUCCI P, MINGIONE E. Deep Boltzmann machines: rigorous results at arbitrary depth[J]. Annales Henri poincaré, 2021, 22(8): 2619−2642.
    [5] WANG Zhendong, ZENG Yong, LIU Yaodi, et al. Deep belief network integrating improved kernel-based extreme learning machine for network intrusion detection[J]. IEEE access, 2021, 9: 16062−16091. doi: 10.1109/ACCESS.2021.3051074
    [6] HAMMOUCHE R, ATTIA A, AKHROUF S, et al. Gabor filter bank with deep autoencoder based face recognition system[J]. Expert systems with applications, 2022, 197: 116743. doi: 10.1016/j.eswa.2022.116743
    [7] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations. NewOrleans: ICLR, 2021: 1−22.
    [8] CHEN Leiyu, LI Shaobo, BAI Qiang, et al. Review of image classification algorithms based on convolutional neural networks[J]. Remote sensing, 2021, 13(22): 4712. doi: 10.3390/rs13224712
    [9] VRÁBEL J, POŘÍZKA P, KAISER J. Restricted Boltzmann machine method for dimensionality reduction of large spectroscopic data[J]. Spectrochimica acta part B: atomic spectroscopy, 2020, 167: 105849. doi: 10.1016/j.sab.2020.105849
    [10] TYAGI P K, AGRAWAL D. Automatic detection of sleep apnea from single-lead ECG signal using enhanced-deep belief network model[J]. Biomedical signal processing and control, 2023, 80: 104401. doi: 10.1016/j.bspc.2022.104401
    [11] KOREN Y, RENDLE S, BELL R. Advances in collaborative filtering[M]//Recommender Systems Handbook. New York: Springer US, 2021: 91−142.
    [12] 汪强龙, 高晓光, 吴必聪, 等. 受限玻尔兹曼机及其变体研究综述[J]. 系统工程与电子技术, 2024, 46(7): 2323−2345.

    WANG Qianglong, GAO Xiaoguang, WU Bicong, et al. Review of research on restricted Boltzmann machine and its variants[J]. Systems engineering and electronics, 2024, 46(7): 2323−2345.
    [13] GU Linyan, YANG Lihua, ZHOU Feng. Approximation properties of Gaussian-binary restricted Boltzmann machines and Gaussian-binary deep belief networks[J]. Neural networks, 2022, 153: 49−63. doi: 10.1016/j.neunet.2022.05.020
    [14] CHEN Zixiang, MA Wanqi, DAI Wei, et al. Conditional restricted Boltzmann machine for item recommendation[J]. Neurocomputing, 2020, 385: 269−277. doi: 10.1016/j.neucom.2019.12.088
    [15] 李晓慧, 汪西莉. 结合卷积受限玻尔兹曼机的CV图像分割模型[J]. 激光与光电子学进展, 2020, 57(4): 041018.

    LI Xiaohui, WANG Xili. CV image segmentation model combining convolutional restricted Boltzmann machine[J]. Laser & optoelectronics progress, 2020, 57(4): 201−212.
    [16] HINTON G E. Training products of experts by minimizing contrastive divergence[J]. Neural computation, 2002, 14(8): 1771−1800. doi: 10.1162/089976602760128018
    [17] LUO Weijian, JIANG Hao, HU Tianyang, et al. Training energy-based models with diffusion contrastive divergences[EB/OL]. (2023−07−04)[2024−10−20]. https://arxiv.org/abs/2307.01668.
    [18] TERENIN A, SIMPSON D, DRAPER D. Asynchronous gibbs sampling[C]//International Conference on Artificial Intelligence and Statistics. [S. l. ]: PMLR, 2020: 144−154.
    [19] CHEN C L P, ZHANG Chunyang, CHEN Long, et al. Fuzzy restricted Boltzmann machine for the enhancement of deep learning[J]. IEEE transactions on fuzzy systems, 2015, 23(6): 2163−2173. doi: 10.1109/TFUZZ.2015.2406889
    [20] JANMAIJAYA M, SHUKLA A K, SETH T, et al. Interval type-2 fuzzy restricted Boltzmann machine for the enhancement of deep learning[C]//2019 IEEE International Conference on Fuzzy Systems. New Orleans: IEEE, 2019: 1−6.
    [21] FENG Shuang, CHEN C L P. A fuzzy restricted Boltzmann machine: novel learning algorithms based on the crisp possibilistic mean value of fuzzy numbers[J]. IEEE transactions on fuzzy systems, 2018, 26(1): 117−130. doi: 10.1109/TFUZZ.2016.2639064
    [22] 张健, 丁世飞, 张楠, 等. 受限玻尔兹曼机研究综述[J]. 软件学报, 2019, 30(7): 2073−2090.

    ZHANG Jian, DING Shifei, ZHANG Nan, et al. Restricted Boltzmann machines: a review[J]. Journal of software, 2019, 30(7): 2073−2090.
    [23] CUONG B C, KREINOVICH V. Picture fuzzy sets-a new concept for computational intelligence problems[C]//2013 Third World Congress on Information and Communication Technologies. Hanoi: IEEE, 2013: 1−6.
    [24] ZHU Sijia, LIU Zhe. Distance measures of picture fuzzy sets and interval-valued picture fuzzy sets with their applications[J]. AIMS mathematics, 2023, 8(12): 29817−29848. doi: 10.3934/math.20231525
    [25] LYU Xueqin, MENG Lingzheng, CHEN Chao, et al. Fuzzy removing redundancy restricted Boltzmann machine: improving learning speed and classification accuracy[J]. IEEE transactions on fuzzy systems, 2020, 28(10): 2495−2509.
    [26] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278−2324. doi: 10.1109/5.726791
    [27] XIAO Han, RASUL K, VOLLGRAF R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms[EB/OL]. (2017−08−25)[2024−10−20]. https://arxiv.org/abs/1708.07747.
    [28] COHEN G, AFSHAR S, TAPSON J, et al. EMNIST: Extending MNIST to handwritten letters[C]//2017 International Joint Conference on Neural Networks. Anchorage: IEEE, 2017: 2921−2926.
    [29] CLANUWAT T, BOBER-IRIZAR M, KITAMOTO A, et al. Deep learning for classical Japanese literature[EB/OL]. (2018−12−03)[2024−10−20]. https://arxiv.org/abs/1812.01718.
    [30] HSIEH P C, CHEN C P. Multi-task learning on MNIST image datasets[C]//International Conference on Learning Representations. Vancouver: OpenReview.net, 2018: 1−7.
    [31] ACHARYA S, PANT A K, GYAWALI P K. Deep learning based large scale handwritten Devanagari character recognition[C]//2015 9th International Conference on Software, Knowledge, Information Management and Applications. Kathmandu: IEEE, 2015: 1−6.
    [32] SOLTANZADEH H, RAHMATI M. Recognition of Persian handwritten digits using image profiles of multiple orientations[J]. Pattern recognition letters, 2004, 25(14): 1569−1576. doi: 10.1016/j.patrec.2004.05.014
    [33] KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images[EB/OL]. (2009−04−08)[2024−12−11]. https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf.
    [34] 张艳霞. 基于受限玻尔兹曼机的深度学习模型及其应用[D]. 成都: 电子科技大学, 2016.

    ZHANG Yanxia. Deep learning models and applications based on the restricted Boltzmann machine[D]. Chengdu: University of Electronic Science and Technology, 2016.
    [35] GLASER P, HUANG K H, GRETTON A. Near-optimality of contrastive divergence algorithms[C]//Proceedings of the 38th International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2024: 91036−91090.
    [36] ZHENG Yujun, SHENG Weiguo, SUN Xingming, et al. Airline passenger profiling based on fuzzy deep machine learning[J]. IEEE transactions on neural networks and learning systems, 28(12): 2911−2923.
WeChat 点击查看大图
图(3)  /  表(6)
出版历程
  • 收稿日期:  2024-12-11
  • 网络出版日期:  2025-08-05

目录

    /

    返回文章
    返回