﻿ 三角距离相关性的标签分布学习
«上一篇
 文章快速检索 高级检索

 智能系统学报  2021, Vol. 16 Issue (3): 449-458  DOI: 10.11992/tis.202001027 0

引用本文

HUANG Yuting, XU Yuanyuan, ZHANG Hengru, et al. Label distribution learning based on triangular distance correlation[J]. CAAI Transactions on Intelligent Systems, 2021, 16(3): 449-458. DOI: 10.11992/tis.202001027.

文章历史

Label distribution learning based on triangular distance correlation
HUANG Yuting , XU Yuanyuan , ZHANG Hengru , MIN Fan
College of Computer Science, Southwest Petroleum University, Chengdu 610500, China
Abstract: Aiming at the representation problem of label correlation, a label distribution learning algorithm based on triangular distance correlation is proposed in this paper. First, a distance-mapping matrix is constructed to describe the mapping relationship between the label distribution and the feature matrix. Then a new triangle distance is designed to characterize the correlation between the labels. Finally, based on the label correlation, the Kullback-Leibler divergence-based objective function is designed. Results on eight datasets show that the proposed algorithm is superior in six evaluation measures in terms of accuracy compared with eight mainstream algorithms.
Key words: label distribution learning    label correlation    triangular distance    distance mapping matrix    multi-label learning    maximum entropy model    Kullback-Leibler divergence    L-BFGS method

Geng等[1]提出了SA-IIS(specialized algorithm improithm lternative scaling)算法，将单个标签数据转换为分布数据，但未考虑标签的相关性。Jia等[16]提出了LDLLC(label distribution learning by exploiting label correlation)算法，使用皮尔逊相关系数描述了标签之间的相关性。Zheng等[17]提出了LDL-SCL(label distribution learning by exploiting sample correlation locally)算法，考虑实例之间的相关性。后2种方法显著提高了模型对标签分布的预测能力。

1 相关工作

1.1 LDL问题描述

1.2 运行实例

XD之间的映射关系可以通过距离映射矩阵θ来描述。给定训练集后，LDL的目标为学习到该距离映射矩阵θ[16]，再通过θ计算出预测标签分布矩阵P = {p1, p2, …, pi}，其中pi = [pi1 pi2pic]，pij为标签yjxi的预测表征度，该表征度用最大熵模型[25]表示，如式(1)所示：

 $p({y_j}|{x_i};{{\theta }}){\rm{ = }}\frac{{\exp \left(\displaystyle\sum\limits_{r = 1}^q {{{{\theta }}_{kr}}{x_{ir}}} \right)}}{{\displaystyle\sum\limits_{k = 1}^c {\exp \left(\displaystyle\sum\limits_{r = 1}^q {{{{\theta }}_{kr}}{x_{ir}}} \right)} }}$ (1)

 ${{{\theta }}^{\rm{*}}}{\rm{ = arg}}\mathop {{\rm{min}}}\limits_{{\theta }} \sum\limits_{i = 1}^n \sum\limits_{j = 1}^c \left({d_{ij}}\ln {\frac{d_{ij}}{p\left({y_j}|{x_i};{{\theta }}\right)}} \right)$ (2)
1.3 已有的LDL算法

LDLLC[16]在IIS-LLD算法的目标函数基础上增加了正则项和标签相关性项。如表4中第2行所示，等号右边第2项为距离映射矩阵θ的F-范数，以防止过拟合。第3项为符号函数与不同距离共同决定的标签相关性项，其中符号函数由皮尔逊相关系数决定。但皮尔逊相关系数存在“2个输入向量间应有线性关系”的约束条件，而距离映射矩阵θ中的任意2个向量要满足该条件较为困难。

EDL(emotion distribution learning from texts)[26]通过采用新散度公式表征所有实例的真实分布与预测分布之间的差异，并增加2个约束项。如表4中第3行所示，等号右边第2项为距离映射矩阵θ的1-范数，以防止过拟合。第3项用不同标签的特征向量之差的2-范数，再乘以基于Plutchik的情绪轮得到的权重，表征不同标签之间的关系。该算法在情绪分类场景下表现较好。

2 本文工作

 $T({{\theta }}) = \sum\limits_{i{\rm{ = }}1}^n {\sum\limits_{j{\rm{ = }}1}^c {\left( {{d_{ij}}\ln \frac{{{d_{ij}}}}{{p\left( {{y_j}|{x_i};{{\theta }}} \right)}}} \right)} } + {\lambda _1}\sum\limits_{i{\rm{ = }}1}^c {\sum\limits_{j{\rm{ = }}1}^c {\eta \left( {{{{\theta }}_i},{{{\theta }}_j}} \right)} }$ (3)

2.1 标签相关性

 $\eta \left( {{{{\theta }}_i},{{{\theta }}_j}} \right) = {\rm{sgn}}({\rm{triangle}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right)) \cdot {\rm{Dis}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right)$ (4)

 ${\rm{triangle}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right) = 1 - \frac{{2\sqrt {\displaystyle\sum\limits_{k = 1}^m {{{({{{\theta }}_{ik}} - {{{\theta }}_{jk}})}^2}} } }}{{\sqrt {\displaystyle\sum\limits_{k = 1}^m {{{{\theta }}_{ik}}^2} } + \sqrt {\displaystyle\sum\limits_{k = 1}^m {{{{\theta }}_{jk}}^2} } }}$ (5)

 ${\rm{sgn}}\left( {{\rm{triangle}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right)} \right) = \left\{ \begin{array}{l} 1,\;\;{\rm{ 0 < triangle}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right) \leqslant 1 \\ 0,\;\;{\rm{ triangle}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right) = 0 \\ {\rm{ - }}1,\;\;{\rm{ - 1}} \leqslant {\rm{triangle}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right) < 0 \\ \end{array} \right.$ (6)

 ${\rm{Dis}}\left( {{{{\theta }}_i},{{{\theta }}_j}} \right) = \sqrt {\sum\limits_{k = 1}^m {{{\left( {{{{\theta }}_{ik}} - {{{\theta }}_{jk}}} \right)}^2}} }$ (7)
2.2 本文提出的T-LDL算法

T-LDL描述见算法1。首先将距离映射矩阵θ(0)和逆拟Hessian矩阵B(0)初始化为单位矩阵，再通过式(3)计算初次目标函数的梯度 ${{\nabla}}$ T(θ(0))。进入迭代，收敛条件为|| ${{\nabla}}$ T(θ(l))||2 < ξ。当不满足收敛条件时，采用L-BFGS方法[27]优化并更新θB。当满足收敛条件时，计算标签yjxi的预测表征度p(yj|xi;θ)。

1)初始化距离映射矩阵θ(0)和逆拟Hessian矩阵B(0)

2)通过式(3)计算梯度 ${{\nabla}}$ T(θ(0))；

3)如果|| ${{\nabla}}$ T(θ(l))||2 > ξ，使用L-BFGS方法[27]优化更新θB

4)end if；

5)ll + 1；

6)通过式(1)计算 p(yj|xi;θ)。

3 实验及结果分析

3.1 数据集

Alpha数据集记录在α因子的影响下酵母在有丝分裂期间的基因表达情况；Cdc数据集记录酵母在细胞分裂期间停滞的cdc-15基因表达情况；Elu数据集记录酵母经离心淘洗后的基因表达情况；Diau数据集记录酵母在双峰转换过程中的基因表达情况；Heat数据集记录酵母在经过高温冲击后的基因表达情况；Spo数据集记录酵母在孢子形成过程中的基因表达情况；Cold数据集记录酵母经低温处理后的基因表达情况；Dtt数据集记录酵母经还原剂处理后的基因表达情况[28]

3.2 评价指标

3.3 实验结果

3.4 讨论

4 结束语

 [1] GENG Xin. Label distribution learning[J]. IEEE transactions on knowledge and data engineering, 2016, 28(7): 1734-1748. DOI:10.1109/TKDE.2016.2545658 (0) [2] JIA Xiuyi, ZHENG Xiang, LI Weiwei, et al. Facial emotion distribution learning by exploiting low-rank label correlations locally[C]//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA, 2019: 9841−9850. (0) [3] YANG Xu, GAO Binbin, XING Chao, et al. Deep label distribution learning for apparent age estimation[C]//Proceedings of 2015 IEEE International Conference on Computer Vision Workshops. Santiago, Chile, 2015: 102−108. (0) [4] ZHANG Hengru, HUANG Yuting, XU Yuanyuan, et al. COS-LDL: label distribution learning by cosine-based distance-mapping correlation[J]. IEEE access, 2020, 8: 63961-63970. DOI:10.1109/ACCESS.2020.2984622 (0) [5] 邵东恒, 杨文元, 赵红. 应用k-means算法实现标记分布学习 [J]. 智能系统学报, 2017, 12(3): 325-332. SHAO Dongheng, YANG Wenyuan, ZHAO Hong. Label distribution learning based on k-means algorithm [J]. CAAI transactions on intelligent systems, 2017, 12(3): 325-332. (0) [6] 刘玉杰, 唐顺静, 高永标, 等. 基于标签分布学习的视频摘要算法[J]. 计算机辅助设计与图形学学报, 2019, 31(1): 104-110. LIU Yujie, TANG Shunjing, GAO Yongbiao, et al. Label distribution learning for video summarization[J]. Journal of computer-aided design & computer graphics, 2019, 31(1): 104-110. (0) [7] 王一宾, 田文泉, 程玉胜. 基于标记分布学习的异态集成学习算法[J]. 模式识别与人工智能, 2019, 32(10): 945-954. WANG Yibin, TIAN Wenquan, CHENG Yusheng. Heterogeneous ensemble learning algorithm based on label distribution learning[J]. Pattern recognition and artificial intelligence, 2019, 32(10): 945-954. (0) [8] 耿新, 徐宁. 标记分布学习与标记增强[J]. 中国科学: 信息科学, 2018, 48(5): 521-530. GENG Xin, XU Ning. Label distribution learning and label enhancement[J]. Scientia sinica informationis, 2018, 48(5): 521-530. DOI:10.1360/N112018-00029 (0) [9] ZHANG Mingling, ZHANG Kun. Multi-label learning by exploiting label dependency[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, USA, 2010: 999−1007. (0) [10] BI Wei, KWOK J T. Multilabel classification with label correlations and missing labels[C]//Proceedings of the 28th AAAI Conference on Artificial Intelligence. Québec City, Canada, 2014: 1680−1686. (0) [11] HUANG Shengjun, ZHOU Zhihua. Multi-label learning by exploiting label correlations locally[C]//Proceedings of the 26th AAAI Conference on Artificial Intelligence. Toronto, Canada, 2012: 949−955. (0) [12] GENG Xin, WANG Qin, XIA Yu. Facial age estimation by adaptive label distribution learning[C]//Proceedings of the 22nd International Conference on Pattern Recognition. Stockholm, Sweden, 2014: 4465−4470. (0) [13] ZHANG Zhaoxiang, WANG Mo, GENG Xin. Crowd counting in public video surveillance by label distribution learning[J]. Neurocomputing, 2015, 166: 151-163. DOI:10.1016/j.neucom.2015.03.083 (0) [14] GENG Xin, YIN Chao, ZHOU Zhihua. Facial age estimation by learning from label distributions[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(10): 2401-2412. DOI:10.1109/TPAMI.2013.51 (0) [15] GENG Xin, LING Miaogen. Soft video parsing by label distribution learning[C]. Proceedings of the 31th AAAI Conference on Artificial Intelligence. San Francisco, USA, 2017: 1331−1337. (0) [16] JIA Xiuyi, LI Weiwei, LIU Junyu, et al. Label distribution learning by exploiting label correlations[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA, 2018: 3310−3317. (0) [17] ZHENG Xiang, JIA Xiuyi, LI Weiwei. Label distribution learning by exploiting sample correlations locally[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA, 2018: 4556−4563. (0) [18] KULLBACK S, LEIBLER R A. On information and sufficiency[J]. The annals of mathematical statistics, 1951, 22(1): 79-86. DOI:10.1214/aoms/1177729694 (0) [19] DANIELSSON P E. Euclidean distance mapping[J]. Computer graphics and image processing, 1980, 14(3): 227-248. DOI:10.1016/0146-664X(80)90054-4 (0) [20] SØRENSEN T. A method of establishing groups of equal amplitude in plant sociology based on similarity of species content, and its application to analyses of the vegetation on Danish commons[J]. Kongelige danske videnskabernes selskab biologiske skrifter, 1948, 5(4): 1-34. (0) [21] GAVIN D G, OSWALD W W, WAHL E R, et al. A statistical approach to evaluating distance metrics and analog assignments for pollen records[J]. Quaternary research, 2003, 60(3): 356-367. DOI:10.1016/S0033-5894(03)00088-7 (0) [22] DUDA R O, HART P E, STORK D G. Pattern classification[M]. 2nd ed. New York: Wiley, 2000. (0) [23] DEZA E, DEZA M M. Dictionary of distances[M]. Amsterdam: Elsevier, 2006. (0) [24] JEGOU H, DOUZE M, SCHMID C. Hamming embedding and weak geometric consistency for large scale image search[C]//Proceedings of the 10th European Conference on Computer Vision. Marseille, France, 2008: 304−317. (0) [25] BERGER A L, PIETRA V J D, PIETRA S A D. A maximum entropy approach to natural language processing[J]. Computational linguistics, 1996, 22(1): 39-71. (0) [26] ZHOU Deyu, ZHANG Xuan, ZHOU Yin, et al. Emotion distribution learning from texts[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas, 2016: 638−647. (0) [27] YUAN Yaxiang. A modified BFGS algorithm for unconstrained optimization[J]. IMA journal of numerical analysis, 1991, 11(3): 325-332. DOI:10.1093/imanum/11.3.325 (0) [28] EISEN M B, SPELLMAN P T, BROWN P O, et al. Cluster analysis and display of genome-wide expression patterns[J]. Proceedings of the national academy of sciences of the united states of America, 1998, 95(25): 14863-14868. DOI:10.1073/pnas.95.25.14863 (0) [29] CHA Su H. Comprehensive survey on distance/similarity measures between probability density functions[J]. International journal of mathematical models and methods in applied sciences, 2007, 1(4): 300-307. (0)