首页 关于本刊 编 委 会 期刊动态 作者中心 审者中心 读者中心 下载中心 联系我们 English
 自动化学报  2017, Vol. 43 Issue (7): 1160-1168 PDF

1. 渤海大学新能源学院 锦州 121013;
2. 渤海大学工学院 锦州 121013

Parameter Optimization of Leaky Integrator Echo State Network with Internal-point Penalty Function Method
LUN Shu-Xian1, HU Hai-Feng2
1. College of New Energy, Bohai University, Jinzhou 121013;
2. College of Engineering, Bohai University, Jinzhou 121013
Manuscript received : July 22, 2016, accepted: October 9, 2016.
Foundation Item: Supported by National Nature Science Foundation of China (61573072, 21506014), Nature Science Foundation of Liaoning Province (2014020143), the First Batch of Science and Technology Projects of Liaoning Province (2011402001), Science and Technology Research Projects of the Department of Education of Liaoning Province (L2015008)
Corresponding author. LUN Shu-Xian  Professor at Bohai University. She received her Ph. D. degree from Northeastern University in 2005. She did her postdoctor research at the Institute of Automation, Chinese Academy of Sciences in 2011. Her research interest covers neural network, energy internet, modeling and energy management for photovoltaic power generation systems. Corresponding author of this paper. E-mail:jzlunzi@163.com
Recommended by Associate Editor WEI Qing-Lai
Abstract: To improve leaky integrator echo state network (Leaky-ESN) performance, internal-point penalty function (IPF) method is used to optimize the global parameters of Leaky-ESN, such as leakage rate, spectral radius of internal connection weight matrix, scaling of input, etc., which overcomes loss of superiority and performance of Leaky-ESN because of using trial and error method to select parameter values. The global parameters of Leaky-ESN have to guarantee the echo state network to meet the echo state property, thus inequality constraints exist between them. Some researchers put forward the method using the stochastic gradient descent (GD) to optimize leakage rate, spectral radius of internal connection weight matrix, and scaling of input, which can improve the approximation precision of the Leaky-ESN to some certain extent. However, the stochastic gradient descent method is a basic algorithm to solve unconstrained optimization problems. Without considering parameters which need satisfy the constraint conditions of the echo state property (inequality constraints) during using stochastic gradient descent method, the parameter value is not the optimal solution. Internal-point penalty function method can solve the optimized problem with inequality constraints, a wide scope of application, fast convergence speed, strong ability of global optimization. Therefore, in this paper, internal-point penalty function method is used to optimize the global parameters of Leaky-ESN, and time series prediction is selected as an example to examine the performance of the optimized Leaky-ESN. Simulation results show the effectiveness of the proposed approach.
Key words: Echo state network (ESN)     time series prediction     constrained optimization     internal-point penalty function (IPF) method     Newton method

1 泄露积分型回声状态网

Leaky-ESN网络是ESN网络的一种改进模型, 能够学习慢变动态系统等.Leaky-ESN与ESN的拓扑结构相同, 其储备池是由泄露积分型神经元组成, 如图 1所示[11].在图 1中, 输入层有$K$个输入节点; 储备池由$N$个内部节点以及稀疏的节点连接权值构成; 输出层有$L$个输出节点.图中实线表示了网络的必要连接, 而虚线表示了不同情况下可能存在的连接.假定网络的输入$u(n)=[u_1(n), u_2(n), \cdots, u_K(n)]^{\rm T}$, 储备池的神经单元状态为$x(n)=[x_1(n), x_2(n), \cdots, x_N(n)]^{\rm T}$, 网络的输出$y(n)=[y_1(n), y_2(n), \cdots, y_L(n)]^{\rm T}$.那么, Leaky-ESN网络的状态更新方程和输出方程为

 图 1 回声状态网络的拓扑结构 Figure 1 Structure of echo state network
 \begin{align} &x\left( n+1 \right)=\left( 1-a \right)x\left( n \right)+f\left( {{s}^{\text{in}}}{{W}^{\text{in}}}u\left( n+1 \right)+ \right. \\ &\quad \quad \quad \quad \left. \left( \rho W \right)x\left( n \right)+{{W}^{\text{fb}}}y\left( n \right) \right) \\ \end{align} (1)
 $y(n)=g\left( {{W}^{\text{out}}}\left[ x(n);u(n) \right] \right)$ (2)

 ${{W}^{\text{out}}}=Y{{X}^{\text{T}}}{{\left( X{{X}^{\text{T}}}+\theta I \right)}^{-1}}$ (3)

 \begin{align}S=\left\{ {\zeta \vert \overline g _i \left( \zeta \right)\ge0, i=1, 2, 3, \cdots, m, \zeta \in \overline X } \right\} \nonumber\end{align}

2.2 目标函数和约束条件的建立

Leaky-ESN全局参数优化的目的是希望网络输出值与期望值之间尽可能的接近.因此, 在本文中建立了如下的目标函数:

 \begin{align}L\left( n \right)=\frac{1}{2}\left\| {y_{\rm teach} \left( n\right)-y\left( n \right)} \right\|^2 \label{eq6}\end{align} (6)

1) $f$是tanh函数;

2) $g$是有界函数(比如tanh函数)或没有输出反馈($W^{\rm fb}=0$);

3) $\left| {1-(a-\sigma (W))} \right|＜1$, $\sigma (\cdot)$表示矩阵的最大奇异值;

1) $f$是tanh函数;

2) 矩阵的谱半径$\left| λ \right|_{\rm max} (\overline W )>1$, 其中$\overline W =\left( {\rho W} \right)+\left( {1-a} \right)I$, ($I$是单位矩阵);

 $a-\rho \text{ }\ge 0$ (7)
 $1-a>0$ (8)
 $\rho >0$ (9)
2.3 罚函数内点法的障碍函数建立

 $$$\label{eq10}\varphi \left( {\zeta, M^{\left( k \right)}} \right)=\overline f \left(\zeta \right)+M^{\left( k \right)}\sum\limits_{i=1}^m {G\left[{\bar {g}_i\left( \zeta \right)} \right]}$$$ (10)

 $G\left[ {{{\bar{g}}}_{i}}(\zeta ) \right]=\frac{1}{{{{\bar{g}}}_{i}}(\zeta )}$ (11)
 $G\left[ {{{\bar{g}}}_{i}}(\zeta ) \right]=-\ln \left( {{{\bar{g}}}_{i}}(\zeta ) \right)$ (12)

 $\varphi \left( \zeta ,{{M}^{\left( k \right)}} \right)=\bar{f}\left( \zeta \right)+{{M}^{\left( k \right)}}\sum\limits_{i=1}^{m}{\frac{1}{{{{\bar{g}}}_{i}}(\zeta )}}$ (13)

 ${{M}^{(k+1)}}=c{{M}^{(k)}}$ (14)

 $M^{\left( 0 \right)}=\left| {\frac{\overline f \left( {\zeta \left( 0\right)} \right)}{\sum\limits_{i=1}^m {\frac{1}{\bar {g}_i \left( {\zeta\left( 0 \right)} \right)}} }} \right|$ (15)

Leaky-ESN模型的惩罚项可以表示成:

 $G[g_i \left( {a, \rho } \right)]=\left\{ {a, \rho \vert a-\rho \ge0, 1-a>0, \rho >0} \right\}$ (16)

 \begin{align}\label{eq17} \varphi \left( {X, M^{\left( k \right)}}\right)=&\sum\limits_{n=1}^T {L\left( n \right)}+\nonumber\\&M^{\left( k \right)}\left( {\frac{1}{a-\rho}+\frac{1}{1-a}+\frac{1}{\rho }} \right)\end{align} (17)

2.4 罚函数内点法的优化过程及步骤

 $$$\label{eq18}\zeta (n+1) =\zeta (n)-\left[{\nabla ^2\varphi (\zeta (n))}\right]^{-1}\nabla \varphi (\zeta (n))$$$ (18)

 \begin{align}\varphi _1 \left( n \right)=&-\ell ^{\rm T}\left( n\right)W^{\rm out}\left[{\frac{\partial x\left( n\right)}{\partial a};{\rm 0}_u }\right]-\nonumber\\&M^{(k)}\left( {\frac{1}{\left( {\rho -a} \right)^2}-\frac{1}{\left({a-1} \right)^2}} \right)\label{eq19}\end{align} (19)
 \begin{align}\varphi _2 (n)=&-\ell ^{\rm T}(n)W^{\rm out}[\frac{\partial x(n)}{\partial \rho };{\rm 0}_u]+\nonumber\\&M^{(k)}\left( {\frac{1}{(\rho -a)^2}-\frac{1}{\rho^2}}\right)\label{eq20}\end{align} (20)
 \begin{align}\varphi _{11} (n)=&\left( {W^{\rm out}\bigg[\frac{\partial x(n)}{\partial a};{\rm 0}_u \bigg]} \right)^2-\nonumber\\&\ell^{\rm T} (n)W^{\rm out}\bigg[\frac{\partial ^2x(n)}{\partial a^2};{\rm 0}_u \bigg]-\nonumber\\&M^{(k)}\left( {\frac{2}{(\rho -a)^3}+\frac{1}{(a-1) ^3}}\right)\label{eq21}\end{align} (21)
 \begin{align}\varphi _{12} (n)\!=\!&\left( W^{\rm out}\bigg[\frac{\partial x(n)}{\partial \rho };{\rm 0}_u \bigg]\!\cdot\! W^{\rm out}\bigg[\frac{\partial x(n)}{\partial a};{\rm 0}_u \bigg] \right)-\nonumber\\&\ell ^{\rm T}(n)W^{\rm out}\bigg[\frac{\partial ^2x(n)}{\partial a\partial \rho };{\rm 0}_u \bigg]+M^{(k)} {\frac{3}{(\rho -a)^3}}\label{eq22}\end{align} (22)
 \begin{align}\varphi _{21} (n)\!=\!&\left( {W^{\rm out}\bigg[\frac{\partial x(n)}{\partial a};{\rm 0}_u \bigg]\!\cdot\! W^{\rm out}\bigg[\frac{\partial x(n)}{\partial \rho };{\rm 0}_u \bigg]} \right)-\nonumber\\&\ell^{\rm T} (n)W^{\rm out}\bigg[\frac{\partial ^2x(n)}{\partial\rho\partial a};{\rm 0}_u \bigg]+M^{(k)}{\frac{3}{(\rho -a)^3}} \label{eq23}\end{align} (23)
 \begin{align}\varphi _{22} (n)=&\left( {W^{\rm out}\bigg[\frac{\partial x(n)}{\partial \rho };{\rm 0}_u \bigg]} \right)^2-\nonumber\\&\ell^{\rm T} (n)W^{\rm out}\bigg[\frac{\partial ^2x(n)}{\partial\rho ^2};{\rm 0}_u \bigg]-\nonumber\\&M^{(k)}\left( {\frac{1}{(\rho -a)^3}-\frac{2}{\rho ^3}}\right)\label{eq24}\end{align} (24)

 $$$\label{eq31} {f}'\left( {\overline \zeta } \right)=\frac{4}{2+{\rm e}^{2\overline \zeta }+{\rm e}^{-2\overline \zeta }}$$$ (31)
 $$$\label{eq32} {f}''\left( {\overline \zeta } \right)=\frac{8\left({{\rm e}^{-2\overline \zeta }-{\rm e}^{2\overline \zeta }}\right)}{\left( {2+{\rm e}^{2\overline \zeta }+{\rm e}^{-2\overline \zeta }} \right)^2}$$$ (32)

 \begin{align}\zeta (n)=&\zeta (n-1) -\left[{{\begin{array}{*{20}c} {\varphi _{11} (n-1) } \hfill&{\varphi _{12} (n-1) } \hfill \\ {\varphi _{21} (n-1) } \hfill&{\varphi _{22} (n-1) } \hfill \\\end{array} }} \right]^{-1}\!\!×\nonumber\\& \left[{{\begin{array}{*{20}c} {\varphi _1 (n-1) } \hfill \\ {\varphi _2 (n-1) } \hfill \\\end{array} }} \right]^{\rm T}\label{eq33}\end{align} (33)

3 仿真与分析

3.1 第一个时间序列

 图 2 第一时间序列的训练误差 Figure 2 Training errors for the first time series
 图 3 第一时间序列的预测 Figure 3 Predicted values for the first time series

 图 4 第一时间序列的预测误差 Figure 4 Predicted errors for the first time series
3.2 第二个时间序列

Mackey-Glass混沌时间序列模型:

 \begin{align}\label{eq34} x(n+1) =& x(n)+\nonumber\\& \Delta T\left(\frac{{\alpha x\left( {t-\dfrac{\tau }{\Delta T}}\right)}}{1+x\left( t-\dfrac{\tau }{\Delta T} \right)^{10}}+\gamma x( n ) \right)\end{align} (34)

 图 5 第一时间序列预测的NRMSE分布图 Figure 5 NRMSE of predicted values for the first time series
 图 6 第二时间序列的训练误差 Figure 6 Training errors for the second time series
 图 7 第二时间序列的预测 Figure 7 Predicted values for the second time series
 图 8 第二时间序列的预测误差 Figure 8 Predicted errors for the second time series

 图 9 第二时间序列预测的NRMSE分布图 Figure 9 NRMSE of predicted values for the second time series
4 结论

 1 Jaeger H. Tutorial on Training Recurrent Neural Networks, Covering BPTT, RTRL, EKF, and the "Echo State Network" Approach. Technical Report GMD Report 159, German National Research Center for Information Technology, German, 2002. 2 Jaeger H. The "Echo State" Approach to Analysing and Training Recurrent Neural Networks. Technical Report GMD report 148, German National Research Center for Information Technology, German, 2001. 3 Jaeger H, Haas H. Harnessing nonlinearity:predicting chaotic systems and saving energy in wireless communication. Science, 2004, 304(5667): 78-80. DOI:10.1126/science.1091277 4 Lun S X, Wang S, Guo T T, Du C J. An Ⅰ-Ⅴ model based on time warp invariant echo state network for photovoltaic array with shaded solar cells. Solar Energy, 2014, 105: 529-541. DOI:10.1016/j.solener.2014.04.023 5 Skowronski M D, Harris J G. Noise-robust automatic speech recognition using a predictive echo state network. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(5): 1724-1730. DOI:10.1109/TASL.2007.896669 6 Han S I, Lee J M. Precise positioning of nonsmooth dynamic systems using fuzzy wavelet echo state networks and dynamic surface sliding mode control. IEEE Transactions on Industrial Electronics, 2013, 60(11): 5124-5136. DOI:10.1109/TIE.2012.2218560 7 Li G Q, Niu P F, Zhang W P, Zhang Y. Control of discrete chaotic systems based on echo state network modeling with an adaptive noise canceler. Knowledge-Based Systems, 2012, 35(15): 35-40. 8 Song R Z, Xiao W D, Sun C Y. A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture. Science China Information Sciences, 2014, 57(6): Article No. 068202. 9 Lun S X, Yao X S, Qi H Y, Hu H F. A novel model of leaky integrator echo state network for time-series prediction. Neurocomputing, 2015, 159(1): 58-66. 10 Bianchi F M, Scardapane S, Uncini A, Rizzi A, Sadeghian A. Prediction of telephone calls load using echo state network with exogenous variables. Neural Networks, 2015, 71(C): 204-213. 11 Jaeger H, Lukoševičius M, Popovici D, Siewert U. Optimization and applications of echo state networks with leaky-integrator neurons. Neural Networks, 2007, 20(3): 335-352. DOI:10.1016/j.neunet.2007.04.016 12 Lukoševičius M. A practical guide to applying echo state networks. Neural Networks:Tricks of the Trade (Second Edition). Berlin Heidelberg: Springer-Verlag, 2012, 659-686. 13 Nocedal J, Wright S J. Numerical Optimization (Second Edition). New York: Springer, 2006, 30-31.