﻿ Cat Boost算法在舰船轴承故障诊断领域的应用
 舰船科学技术  2022, Vol. 44 Issue (23): 117-122    DOI: 10.3404/j.issn.1672-7649.2022.23.023 PDF
Cat Boost算法在舰船轴承故障诊断领域的应用

Application of category boosting algorithm in fault diagnosis of ship's bearing
XING Zhi-kai, LIU Yong-bao, WANG Qiang, LI Jun
College of Power Engineering, Naval University of Engineering, Wuhan 430032, China
Abstract: Bearing is a common location for ship failure, and a bearing diagnostic technology based on Cat Boost (category boosting) algorithm is proposed for the existing machine learning methods in the field of ship bearing fault diagnosis, such as poor multi-classification accuracy and low computational efficiency. Firstly, the vibration signals are analyzed in time domain, frequency domain and EMD decomposition, and the characteristics of the selected vibration signal segment are obtained. Secondly, Cat Boost algorithm was used to filter the extracted features, and Gini index was used to quickly establish the tree structure and sort it. Finally, the input of different dimension features is selected to evaluate the model algorithm, and the accuracy of classification is compared with that of traditional methods. Experimental results show that the proposed method is more effective in fault feature extraction for multi-classification of rolling bearing faults, and the recognition effect is obviously better than other traditional algorithms.
Key words: rolling bearings     Cat Boost     gini index     feature extraction     fault diagnosis
0 引　言

Cat Boost(category boosting)是一款梯度提升机器学习库，不需要像其他模型那样进行广泛的数据训练[5]，具有高效合理处理类别型特征的优点。本文将Cat Boost算法应用于滚动轴承智能诊断研究中，并将该方法与BP(back propagation）神经网络[6]和支持向量机[7]（support vector machine，SVM）算法进行比较，验证该方法的可行性。

1 多特征提取 1.1 时域特征提取

 $L = \frac{{{x_{{\rm{max}}}}}}{{{{\left(\dfrac{1}{N}\displaystyle\sum\limits_{i = 1}^N {\sqrt {\left| {{x_i}} \right|} } \right)}^2}}}。$

1.2 频域特征提取

 ${F_{msf}} = \frac{{\displaystyle\sum\limits_{k = 1}^k {f_k^2s(k)} }}{{\displaystyle\sum\limits_{k = 1}^k {s(k)} }} 。$

1.3 时频域特征提取

 ${h_1}\left( t \right) = x\left( t \right) - {m_1}，$

${h_1}\left( t \right)$ 满足上述IMF条件，则 ${h_1}\left( t \right)$ 判定为EMD分解所得的一阶IMF分量，记为 ${c_1}\left( t \right)$ ；若 ${h_1}\left( t \right)$ 不满足上述IMF条件，则将 ${h_1}\left( t \right)$ 重新作为原始振动信号，重新进行步骤1，直到出现满足条件的IMF分量。

$k$ $n$ 阶IMF分量与一个余量信号的总和表示原始振动信号的物理含义：

 ${x_1}\left( t \right) = \sum\limits_{i = 1}^k {{c_i}\left( t \right)} + {r_{k + 1}}\left( t \right)，$

 $Corr({c_i},{X_j}) = \frac{{\displaystyle\sum\limits_{n = 1}^N {[{c_i}(n) - \widehat {{c_i}}][{X_j}(n) - \widehat {{X_j}}]} }}{{\sqrt {\displaystyle\sum\limits_{n = 1}^N {{{[{c_i}(n) - \widehat {{c_i}}]}^2} \cdot {{[{X_j}(n) - \widehat {{X_j}}]}^2}} } }} ，$

 $EnergyRati{o_j}\left( i \right) = \frac{{\displaystyle\sum\limits_{i = 1}^N {{{\left| {{c_i}} \right|}^2}} }}{{\displaystyle\sum\limits_{i = 1}^N {{{\left| {{X_j}} \right|}^2}} }} 。$

2 故障诊断方法

Cat Boost[12]是一种基于boosting能够很好地处理类别型特征的梯度提升算法库。其具有两大优势，一是在模型训练过程中自动选择效果较好的特征，避免了人为特征筛选过程；二是计算叶子节点的算法可以避免过拟合现象[13]。在决策树的过程中，可以用基尼指数[14]（Gini index）作为评价指标来衡量特征重要性。

Gini指数的计算公式为：

 $G{I_m} = \sum\limits_{k = 1}^{|K|} {\sum\limits_{k' \ne k} {{p_{mk}}{p_{mk\prime }}} } = 1 - \sum\limits_{k = 1}^{|K|} {p_{mk}^2} 。$

 $VIM_{jm}^{({\rm{Gini}})} = G{I_m} - G{I_l} - G{I_r}，$

$G{I_l}$ $G{I_r}$ 分别表示分枝后2个新节点Gini指数。

 $VIM_{ij}^{({\rm{Gini}})} = \sum\limits_{m \in M} {VIM_{jm}^{({\rm{Gini}})}}，$

 $VIM_j^{({\rm{Gini}})} = \sum\limits_{i = 1}^n {VIM_{ij}^{({\rm{Gini}})}} 。$

 $VI{M_j} = \frac{{VI{M_j}}}{{\displaystyle\sum\nolimits_{i = 1}^c {VI{M_i}} }}。$

$VIM_j^{}$ 越大，表征该特征重要评分越高。

3）构造梯度值损失函数loss(Tc)，选取使损失函数取最小值的树结构作为本轮的树模型Treet

4）更新每种排序策略σr对应的Moder

Cat Boost算法进行特征重要性分析的优点在于不需要额外去做特征选择，因为模型训练的过程中自身已经完成了特征选择，得到了不同特征的评价得分，根据这些得分大小就可得到不同特征对应模型的特征贡献度。

3 工程试验 3.1 数据集来源

3.2 多特征提取

3.3 仿真分析 3.3.1 不同分类方案精度评价与比较

3.3.2 基于Cat Boost模型的特征评价及降维

 图 1 前20维特征重要性排序图 Fig. 1 The importance Ranking diagram of the TOP 20 characteristics

 图 2 不同维度下3种算法的准确度变化曲线 Fig. 2 The accuracy curves of three algorithms under different dimension input

 图 3 特征维度为10时Cat Boost算法分类的混淆矩阵 Fig. 3 The confusion matrix of Category Boosting classification when the feature dimension is 10
4 结　语

 [1] 吴国文, 田杨阳, 毛文涛. 基于特征融合的滚动轴承在线故障诊断[J]. 控制与决策, 2018: 2–7. [2] 李俊, 刘永葆, 余又红. 卷积神经网络和峭度在轴承故障诊断中的应用[J]. 航空动力学报, 2019, 34(11): 2423-2431. DOI:10.13224/j.cnki.jasp.2019.11.014 [3] 李郅琴, 杜建强, 聂斌, 等. 特征选择方法综述[J]. 计算机工程与应用, 2019, 55(24): 10-19. DOI:10.3778/j.issn.1002-8331.1909-0066 [4] 崔宇佳, 张一迪, 王培志, 等. 基于多评价标准融合的医疗数据特征选择算法[J]. 复旦学报(自然科学版), 2019, 58(2): 250-255+268. DOI:10.15943/j.cnki.fdxb-jns.2019.02.015 [5] 苏庆, 林华智, 黄剑锋, 等. 结合CNN和Catboost算法的恶意安卓应用检测模型[J]. 计算机工程与应用, 2021, 57(15): 140-146. DOI:10.3778/j.issn.1002-8331.2004-0385 [6] MCCLELLAND J L, RUMELHART D E, Hinton G E. Une nouvelle approche de la cognition : le connexionnisme[J]. Le Débat, 1987, 47(5). [7] CORINNA C, VLADIMIR V. Support-vector networks[J]. Machine Learning, 1995, 20(3). [8] 崔立明. 中介轴承寿命预测方法与寿命试验研究[D]. 大连: 大连理工大学, 2016. [9] 洪振麒. 基于多特征融合的滚动轴承剩余寿命预测方法研究[D]. 沈阳: 沈阳大学, 2021. [10] 周小龙, 杨恭勇, 梁秀霞. 基于EMD重构和SVM的滚动轴承故障诊断方法研究[J]. 东北电力大学学报, 2016, 36(6): 71-6. DOI:10.3969/j.issn.1005-2992.2016.06.014 [11] VEER L J, DAI H, VIJVER M J, et al. Gene expression profiling predicts clinical outcome of breast cancer[J]. Nature, 2002, 415(6871): 530-536. DOI:10.1038/415530a [12] PROKHORENKOVA L, GUSEV G, VOROBEV A. CatBoost: unbiased boosting with categorical features[C]//Advances in Neural Information Processing Systems, 2018: 6638-6648. [13] 姜琦刚, 杨秀艳, 杨长保, 等. 基于CatBoost算法的面向对象土地利用分类[J]. 吉林大学学报(信息科学版), 2020, 38(2): 185-191. DOI:10.19292/j.cnki.jdxxp.2020.02.028 [14] GOLDSTEIN B A, POLLEY E C, BRIGGS F B S. Random forests for genetic association studies. [J]. Statistical Applications in Genetics and Molecular Biology, 2011, 10(1).