 自动化学报  2017, Vol. 43 Issue (2): 248-258 PDF

1. 解放军理工大学 南京 210007;
2. 西安通信学院 西安 710106;
3. 中国人民解放军96637部队 北京 102101

A Single-channel Speech Enhancement Approach Based on Perceptual Masking Deep Neural Network
HAN Wei1, ZHANG Xiong-Wei1, MIN Gang1,2, ZHANG Qi-Ye3
1. PLA University of Science and Technology, Nanjing 210007;
2. Xi'an Communications Institute, Xi'an 710106;
3. Unit 96637 of PLA, Beijing 102101
Foundation Item: Supported by National Natural Science Foundation of China (61471394, 61402519), Natural Science Foundation of Jiangsu Province (BK20140071, BK20140074)
Corresponding author. ZHANG Xiong-Wei Professor at the College of Command Information System, PLA University of Science and Technology. He received his Ph. D. degree from Nanjing Institute of Communication Engineering in 1992. His research interest covers intelligence information processing, speech and image signal processing, and telecommunication systems. Corresponding author of this paper
Recommended by Associate Editor KE Deng-Feng
Abstract: A new deep neural network (DNN) is proposed for single-channel speech enhancement, which incorporates the perceptual masking properties of psychoacoustic models. Firstly, the proposed DNN is trained to learn both the clean speech magnitude spectrum and the noise magnitude spectrum from the noisy magnitude spectrum. Secondly, the estimated clean speech magnitude spectrum is used to calculate the noise masking threshold. Then, the noise masking threshold and the estimated noise magnitude spectrum are combined to calculate a perceptual gain function. Finally, the enhanced speech magnitude spectrum are obtained by jointly training the perceptual gain function and the noisy speech magnitude spectrum. Experimental results on TIMIT with 20 noise types at various SNR (signal-noise ratio) levels demonstrate that the proposed perceptual masking DNN can effectively remove the noise while maintaining small speech distortion, so as to obtain better performance than the common DNN methods and the NMF (nonnegative matrix factorization) method, no matter noise conditions are included in the training set or not.
Key words: Speech enhancement     deep neural network     perceptual gain function     masking threshold

1 基于DNN的语音增强方法及噪声掩蔽阈值

1.1 基于DNN的语音增强方法 1.1.1 DNN网络结构

 图 1 基于DNN的语音增强 Figure 1 Speech enhancement based on DNN

DNN的结构通常由3部分组成:输入层、隐藏层和输出层.输入层用来输入带噪语音的特征参数.隐藏层一般由多层堆叠而成, 相邻层节点之间有连接, 同一层及跨层节点之间无连接.输入层以及隐藏层的各个层之间利用激活函数传递数据, 上一层计算得到的输出作为下一层的输入变量, 如式(1) 所示:

 $\begin{eqnarray} \begin{array}{*{20}{c}} {\pmb{h}^l} = \sigma ({W^l}{\pmb{h}^{l - 1}} + {\pmb{b}^l}) \end{array} \end{eqnarray}$ (1)

1.1.2 DNN网络的训练

 $\begin{eqnarray} \begin{array}{*{20}{c}} {J_{{\rm{MSE}}}}(W, \pmb{b}) = \dfrac{1}{N}\sum\limits_{n = 1}^N {\dfrac{1}{2}{{\left\| {{\hat{\pmb{S}}_n}(W, \pmb{b}) - {\pmb{S}_n}} \right\|}^2}} \end{array} \end{eqnarray}$ (2)

 $\begin{eqnarray} \begin{array}{*{20}{c}} {W^l} = {W^l} - \varepsilon \dfrac{{\partial J(W, \pmb{b})}}{{\partial {W^l}}}, {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 1 \le l \le L + 1 \end{array} \end{eqnarray}$ (3)
 $\begin{eqnarray} \begin{array}{*{20}{c}} {\pmb{b}^l} = {\pmb{b}^l} - \varepsilon \dfrac{{\partial J(W, \pmb{b})}}{{\partial {\pmb{b}^l}}}, {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} 1 \le l \le L + 1 \end{array} \end{eqnarray}$ (4)

1.2 心理声学模型及噪声掩蔽阈值的计算

Johnston提出了一种在各语音帧中, 估计背景噪声掩蔽阈值的一般方法[19], 该方法建立在临界带分析的基础上, 具体可以表述为以下4个步骤:

 $\begin{eqnarray} \begin{array}{*{20}{c}} \pmb{P}(\omega) = {{\mathop{\rm Re}\nolimits} ^2}(\pmb{S}(\omega )) + {{\mathop{\rm Im}\nolimits} ^2}(\pmb{S}(\omega )) \end{array} \end{eqnarray}$ (5)

 $\begin{eqnarray} \begin{array}{*{20}{c}} {B_i} = \sum\limits_{\omega = b{l_i}}^{b{h_i}} {\pmb{P}(\omega )} \end{array} \end{eqnarray}$ (6)

 $\begin{eqnarray} \begin{array}{*{20}{c}} {C_i} = {S_{ij}} * {B_i} \end{array} \end{eqnarray}$ (8)

 $\begin{eqnarray} \begin{array}{*{20}{c}} {T_{\rm{N}}} = {C_i} - 14.5 - i \end{array} \end{eqnarray}$ (9)

 $\begin{eqnarray} \begin{array}{*{20}{c}} {T_{\rm{T}}} = {C_i} - 5.5 \end{array} \end{eqnarray}$ (10)

 $\begin{eqnarray} {\rm{SFM}}_{\rm{dB}}= 10{\rm{lg}}\left( \dfrac {G_{\rm m}} {A_{\rm m} } \right) \end{eqnarray}$ (11)

 $\begin{eqnarray} \begin{array}{*{20}{c}} \pmb{Y}(\omega ) = {F^{\rm H}}\pmb{y} = {F^{\rm H}}\pmb{s} + {F^{\rm H}}\pmb{n} = \pmb{S}(\omega ) + \pmb{N}(\omega ) \end{array} \end{eqnarray}$ (16)

2.3 PM-DNN语音增强流程

 图 3 基于PM-DNN的语音增强框图 Figure 3 The framework of speech enhancement based on PM-DNN

3 PM-DNN语音增强方法性能评估

3.1 实验数据及设置

3.2 对比方法及评价指标

 $\begin{eqnarray} \begin{array}{*{20}{c}} \rm{IRM} = \dfrac{{|\pmb{S}(\omega ){|^2}}}{{|\pmb{S}(\omega ){|^2} + |\pmb{N}(\omega ){|^2}}} \end{array} \end{eqnarray}$ (26)

3.3 实验结果及分析

 图 4 PM-DNN目标函数中的权重$\alpha$和$\beta$对20种噪声的PESQ均值影响 Figure 4 The PESQ scores of PM-DNN objective function with different $\alpha$ and $\beta$ (For each condition, the numbers are the mean values over all the 20 noise types.)

 图 5 4种增强方法在20种不同噪声情况下的PESQ值(每种噪声的PESQ值是在-5 dB, 0 dB, 5 dB和10 dB 4种信噪比下的平均值.) Figure 5 The PESQ scores of the 4 enhancement methods for the 20 noise types (For each noise type, the numbers are the mean values over four input SNR conditions, i.e. from -5 dB to 10 dB spaced by 5 dB.)

 图 6 4种增强方法在20种不同噪声情况下的LSD值(每种噪声的LSD值是在-5 dB, 0 dB, 5 dB和10 dB 4种信噪比下的平均值.) Figure 6 The LSD values of the 4 enhancement methods for the 20 noise types (For each noise type, the numbers are the mean values over four input SNR conditions, i.e. from -5 dB to 10 dB spaced by 5 dB.)
 图 7 4种增强方法在20种不同噪声情况下的fwSNRseg值(每种噪声的fwSNRseg值是在-5 dB, 0 dB, 5 dB和10 dB 4种信噪比下的平均值.) Figure 7 The fwSNRseg values of the 4 enhancement methods for the 20 noise types (For each noise type, the numbers are the mean values over four input SNR conditions, i.e. from -5 dB to 10 dB spaced by 5 dB.)

 图 8 语谱图 Figure 8 Spectrograms

4 结论及展望

