优化AUC两遍学习算法
«上一篇
 文章快速检索 高级检索

 智能系统学报  2018, Vol. 13 Issue (3): 395-398  DOI: 10.11992/tis.201706079 0

### 引用本文

LUAN Xun, GAO Wei. Two-pass AUC optimization[J]. CAAI Transactions on Intelligent Systems, 2018, 13(3), 395-398. DOI: 10.11992/tis.201706079.

### 文章历史

Two-pass AUC optimization
LUAN Xun, GAO Wei
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
Abstract: The area under an ROC curve (AUC) has been an important performance index for class-imbalanced learning, cost-sensitive learning, learning to rank, etc. Traditional AUC optimization requires the entire dataset to be stored because AUC is defined as pairs of positive and negative instances. To solve this problem, the one-pass AUC (OPAUC) algorithm was introduced previously to scan the data only once and store the first- and second-order statistics. However, in many real applications, the second-order statistics require high storage and are computationally costly, especially for high-dimensional datasets. We introduce the two-pass AUC (TPAUC) optimization to calculate the mean of positive and negative instances in the first pass and then use the stochastic gradient descent method in the second pass. The new algorithm requires the storage of the first-order statistics but not the second-order statistics; hence, the efficiency is improved. Finally, experiments are used to verify the effectiveness of the proposed algorithm.
Key words: machine learning    AUC    ROC    one-pass learning    online learning    ranking    stochastic gradient descent    statistics

1 TPAUC学习方法

 ${{S}} = \{ ({x_1},{y_1}),({x_2},{y_2}), \cdots ,({x_n},{y_n})\}$

 $\sum\limits_{i \ne j} {\frac{{I[f({x_i}) > f({x_j})] + \displaystyle\frac{1}{2}I[f({x_i}) = f({x_j})]}}{{{T_{{{{S}}^ + }}}{T_{{{{S}}^ - }}}}}}$ (1)

 $L(f,S) = \sum\limits_{i \ne j} {\frac{{l(f({x_i}) - f({x_j}))I[{y_i} > {y_j}]}}{{{T_{{{{S}}^ + }}}{T_{{{{S}}^ - }}}}}}$ (2)

 $l(f({x_i}) - f({x_j})) = {(1 - f({x_i}) + f({x_j}))^2}$

 $L({{w}}) = \sum\limits_{i \ne j} {\frac{{l({{w}})I[{y_i} > {y_j}]}}{{{T_{{{{S}}^ + }}}{T_{{{{S}}^ - }}}}}}$

 $\begin{array}{c}L({{w}}) = {\rm{1 + }}\displaystyle\frac{{\rm{1}}}{{{T_ + }}}{\sum\limits_{i:{y_i} = 1} {\left\langle {{{w}},{{{x}}_i}} \right\rangle } ^2}{\rm{ + }}\frac{{\rm{1}}}{{{T_ - }}}{\sum\limits_{i:{y_i} = {\rm{ - }}1} {\left\langle {{{w}},{{{x}}_i}} \right\rangle } ^2} - \\[8pt]\displaystyle\frac{{\rm{1}}}{{{T_ + }{T_ - }}}\sum\limits_{i:{y_i} = 1} {\left\langle {{{w}},{{{x}}_i}} \right\rangle } \sum\limits_{i:{y_i} = {\rm{ - }}1} {\left\langle {{{w}},{{{x}}_i}} \right\rangle } - \\[8pt]\displaystyle\frac{{\rm{2}}}{{{T_ + }}}\sum\limits_{i:{y_i} = 1} {\left\langle {{{w}},{{{x}}_i}} \right\rangle } {\rm{ + }}\frac{{\rm{2}}}{{{T_ - }}}\sum\limits_{i:{y_i} = {\rm{ - }}1} {\left\langle {{{w}},{{{x}}_i}} \right\rangle } \end{array}$

 ${{S}}_T^ + = \frac{{\rm{1}}}{{{T_ + }}}\sum\limits_{i:{y_i} = 1} {{{{x}}_i}{{x}}_i^{\rm T}}, \; {{S}}_T^ - = \frac{{\rm{1}}}{{{T_ - }}}\sum\limits_{i:{y_i} = - 1} {{{{x}}_i}{{x}}_i^{\rm T}}$

 ${{c}}_T^ + = \frac{{\rm{1}}}{{{T_ + }}}\sum\limits_{i:{y_i} = 1} {{{{x}}_i}}, \; {{c}}_T^ - = \frac{{\rm{1}}}{{{T_ - }}}\sum\limits_{i:{y_i} = - 1} {{{{x}}_i}}$

 $L({{w}}) = \sum\limits_t {{{{L_t}({{w}})} / T}}$ (3)

${y_t} = 1$ 时，有

 ${L_t}({{w}}) = 1 + {{{{\langle {{w}},{{{x}}_t} - {{c}}_T^ + \rangle }^2}} / {{T_ + }}} + {\langle {{w}},{{c}}_T^ + - {{c}}_T^ - \rangle ^2} - 2\langle {{w}},{{c}}_T^ + - {{c}}_T^ - \rangle$

${y_t} = - 1$ 时，有

 ${L_t}({{w}}) = 1 + {{{{\langle {{w}},{{{x}}_t} - {{c}}_T^ - \rangle }^2}} / {{T_ - }}} + {\langle {{w}},{{c}}_T^ - - {{c}}_T^ + \rangle ^2} - 2\langle {{w}},{{c}}_T^ - - {{c}}_T^ + \rangle$

 ${{{w}}_t} = {{{w}}_t} - \eta \nabla {L_t}({{{w}}_{t - 1}})$

2 实验验证

TPAUC算法的学习率参数 $\eta$ 和正则化参数 $\lambda$ 范围都为 ${\rm{\{ }}{{\rm{2}}^{ - 10}},{{\rm{2}}^{ - 9}},{{\rm{2}}^{ - 8}}, \cdots ,2,4\}$ 。首先将数据集划分为训练集和测试集，参数的选择通过在训练集上进行五折交叉验证来确定。选定参数后，再在测试集上进行5遍五折交叉验证，将这25次的结果取平均值作为最终的测试结果。

1) OPAUC：优化AUC单遍学习算法[3]

2) OAMseq：优化AUC的在线学习算法[14]

3) OAMgra：优化AUC的在线学习算法[14]

4) Online Uni-Exp：优化加权单变量指数损失函数[20]

5) Online Uni-Squ：优化加权单变量平方损失函数[20]

3 结束语

ROC曲线下的面积(简称AUC)是机器学习中一种重要的性能评价准则，由于AUC定义于正负样本之间，传统方法需存储整个数据而不能适用于大数据。为此Gao等提出优化AUC的单遍学习算法，该算法仅需遍历数据一次，通过存储一阶与二阶统计量来进行优化AUC学习。本文致力于减少二阶统计量的计算与存储开销，提出一种新的优化AUC两遍学习算法TPAUC。新提出的算法只需计算与存储一阶统计量，而不需要存储二阶统计量，从而有效地提高效率，最后本文通过实验验证了该算法的有效性。

 [1] HSIEH F, TURNBULL B W. Nonparametric and semiparametric estimation of the receiver operating characteristic curve[J]. The annals of statistics, 1996, 24(1): 25-40. DOI:10.1214/aos/1033066197 (0) [2] ELKAN C. The foundations of cost-sensitive learning[C]//Proceedings of the 17th International Joint Conference on Artificial Intelligence. Seattle, WA, 2001: 973–978. (0) [3] GAO Wei, JIN Rong, ZHU Shenghuo, et al. One-pass AUC optimization[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, GA, 2013: 906–914. (0) [4] HAND D J. Measuring classifier performance: a coherent alternative to the area under the ROC curve[J]. Machine learning, 2009, 77(1): 103-123. DOI:10.1007/s10994-009-5119-5 (0) [5] EGAN J P. Signal detection theory and ROC analysis, series in cognition and perception[M]. New York: Academic Press, 1975. (0) [6] WU Jianxin, BRUBAKER S C, MULLIN M D, et al. Fast asymmetric learning for cascade face detection[J]. IEEE transactions on pattern analysis and machine intelligence, 2008, 30(3): 369-382. DOI:10.1109/TPAMI.2007.1181 (0) [7] BREFELD U, SCHEFFER T. AUC maximizing support vector learning[C]//Proceedings of ICML 2005 Workshop on ROC Analysis in Machine Learning. Bonn, Germany, 2005. (0) [8] JOACHIMS T. A support vector method for multivariate performance measures[C]//Proceedings of the 22nd International Conference on Machine Learning. Bonn, Germany, 2005: 377–384. (0) [9] FREUND Y, IYER R, SCHAPIRE R, et al. An efficient boosting algorithm for combining preferences[J]. Journal of machine learning research, 2003, 4: 933-969. (0) [10] RUDIN C, SCHAPIRE R E. Margin-based ranking and an equivalence between AdaBoost and RankBoost[J]. Journal of machine learning research, 2009, 10: 2193-2232. (0) [11] HERSCHTAL A, RASKUTTI B. Optimising area under the ROC curve using gradient descent[C]//Proceedings of the 21st International Conference on Machine Learning. Banff, Alberta, Canada, 2004. (0) [12] AGARWAL S, ROTH D. Learnability of bipartite ranking functions[C]//Proceedings of the 18th Annual Conference on Learning Theory. Bertinoro, Italy, 2005: 16–31. (0) [13] GAO Wei, ZHOU Zhihua. Uniform convergence, stability and learnability for ranking problems[C]//Proceedings of the 23rd International Joint Conference on Artificial Intelligence. Beijing, China, 2013: 1337–1343. (0) [14] ZHAO Peilin, HOI S C H, JIN Rong, et al. Online AUC maximization[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, WA, 2011: 233–240. (0) [15] GAO Wei, WANG Lu, JIN Rong, et al. One-pass AUC optimization[J]. Artificial intelligence, 2016, 236: 1-29. DOI:10.1016/j.artint.2016.03.003 (0) [16] SHALEV-SHWARTZ S, SINGER Y, SREBRO N, et al. Pegasos: primal estimated sub-gradient solver for SVM[J]. Mathematical programming, 2011, 127(1): 3-30. DOI:10.1007/s10107-010-0420-4 (0) [17] CESA-BIANCHI N, LUGOSI G. Prediction, learning, and games[M]. New York: Cambridge University Press, 2006. (0) [18] HAZAN E, KALAI A, KALE S, et al. Logarithmic regret algorithms for online convex optimization[C]//Proceedings of the 19th Annual Conference on Learning Theory. Pittsburgh, PA, 2006: 499–513. (0) [19] DEVROYE L, GYÖRFI L, LUGOSI G. A probabilistic theory of pattern recognition[M]. New York: Springer, 1996. (0) [20] KOTŁOWSKI W, DEMBCZYŃSKI K, HÜLLERMEIER E. Bipartite ranking through minimization of univariate loss[C]//Proceedings of the 28th International Conference on Machine Learning. Bellevue, WA, 2011: 1113–1120. (0)