IEEE/CAA Journal of Automatica Sinica  2018, Vol. 5 Issue(5): 968-976   PDF    
Modeling of Energy Consumption and Effluent Quality Using Density Peaks-based Adaptive Fuzzy Neural Network
Junfei Qiao, Hongbiao Zhou     
Beijing University of Technology and Beijing Key Laboratory of Computational Intelligence and Intelligent System, Beijing 100124, China
Abstract: Modeling of energy consumption (EC) and effluent quality (EQ) are very essential problems that need to be solved for the multiobjective optimal control in the wastewater treatment process (WWTP). To address this issue, a density peaks-based adaptive fuzzy neural network (DP-AFNN) is proposed in this study. To obtain suitable fuzzy rules, a DP-based clustering method is applied to fit the cluster centers to process nonlinearity. The parameters of the extracted fuzzy rules are fine-tuned based on the improved Levenberg-Marquardt algorithm during the training process. Furthermore, the analysis of convergence is performed to guarantee the successful application of the DPAFNN. Finally, the proposed DP-AFNN is utilized to develop the models of EC and EQ in the WWTP. The experimental results show that the proposed DP-AFNN can achieve fast convergence speed and high prediction accuracy in comparison with some existing methods.
Key words: Density peaks clustering     effluent quality (EQ)     energy consumption (EC)     fuzzy neural network     improved Levenberg-Marquardt algorithm     wastewater treatment process (WWTP)    
Ⅰ. INTRODUCTION

Wastewater treatment processes (WWTPs) are complex and energy-intensive systems as they require continuous operation with effluent treatment [1]. In fact, to meet the effluent discharge standards and reduce the effluent quality (EQ), wastewater treatment plants always run at full load status. It is worth noting that EQ is used to represent the fines which are required to be paid due to the discharge of pollutants in to the receiving water bodies. Under full load status, both of the dissolved oxygen concentration ($S_\rm{O}$) in the aerobic reactors and the nitrate concentration ($S_\rm{NO}$) in the anaerobic reactors are maintained at a high level through aeration and pumping. It is inevitable to remarkably increase the energy consumption (EC). According to the biochemical reaction mechanism of the WWTP, only set-points of $S_\rm{O}$ and $S_\rm{NO}$ are set in a manner that the nitrification and denitrification can go smoothly. In addition, to meet the requirements of energy conservation and emission reduction, it is imperative to realize optimal control in the WWTP. The design of optimal control strategy mainly includes optimization objectives, optimization algorithm, and loop controllers. However, owing to the existing nonlinear characteristics and uncertain factors, it is difficult to establish optimization models of EC and EQ in the WWTP [2].

At present, the mechanism models of EC and EQ are mainly designed by the usage of process variables in the WWTP. Hernández et al. utilized a non-radial data envelopment analysis (DEA) method to calculate the energy index for a wastewater treatment plant located in Spain [3]. Yang et al. applied a computational fluid dynamic (CFD) model to optimize the operating condition by taking into consideration the flow field and dissolved oxygen concentration [4]. Vera et al. employed a statistical software, called INFOSTAT, to establish an EC model. However, these mechanism models have certain limitations, such as a single decision variable, complex structure, and many parameters [5]. Thus, the prediction accuracy of these mechanism-based models cannot satisfy the requirements of multi-objective optimal control in the WWTP.

The data-driven modeling method has the advantages of online detection, short time, and high accuracy. It is widely employed to identify the nonlinear dynamic systems. With the ability to approach any nonlinear system with arbitrary precision, neural networks have been widely applied to predict the critical parameters of water quality in WWTPs [6]. Noori et al. utilized a support vector machine to predict the biochemical oxygen demand (BOD) in order to determine the uncertainty of the data-driven method [7]. Qiao et al. introduced a soft-computing method for BOD, where the T-S fuzzy neural network with the error backpropagation (EBP) algorithm is utilized to design the prediction model [8]. Kusiak et al. adopted a multilayer perceptron-based data-mining method to establish the EC and wastewater flow rate models [9]. Han et al. applied an adaptive regressive kernel function (ARKF) approach to develop an EC model [10]. However, the research on the modeling of EC and EQ from the perspective of multiobjective optimal control is rare. In addition, since the initial network structure is generated randomly and the parameters are usually adjusted by the EBP algorithm during the learning process, the prediction accuracy of the neural network-based data-driven models needs to be further improved.

Recent studies have shown that fuzzy neural network (FNN), which combines the learning capability of a neural network with the interpretability of a fuzzy system, has tremendous advantage, and has an extremely extensive application in data-driven modeling [11]. To develop the EC and EQ models with the usage of FNN, two issues must be dealt with, including the determination of network structure and the learning of network parameters. To determine the network size, some flexible clustering methods, such as fuzzy c-means clustering [12], Gustafson-Kessel clustering [13], and K-nearest neighbors clustering [14], are applied to partition the input space. These methods are mainly used for offline modeling. Another class of previously proposed methods, considering that whole training samples might not be available at once, utilizes online modeling [15], [16]. However, for these online modeling approaches, the main problem regarding how to produce new rules or prune redundant rules based on some approximate criteria still remains open. To update the parameters of FNN, the EBP algorithm is adopted frequently. However, as widely reported in the literature, the EBP algorithm has some drawbacks, such as slow convergence, local optimum, and low precision [17]. To enhance the learning performance of FNN, various types of learning algorithms have been proposed for the training of FNN, such as hybrid learning algorithm [18], adaptive gradient descent approach [19], and evolutionary algorithm [20]. However, these algorithms have either poor convergence ability or long training time.

In this paper, a novel fuzzy clustering approach based on the density peaks (DP) is applied to determine the network size. Fuzzy rules with appropriate centers can enhance the generalization performance of FNN. Meanwhile, an adaptive learning algorithm based on the improved Levenberg-Marquardt approach is designed to optimize the parameters of FNN during the learning process. The convergence speed and approximation accuracy of FNN can be enhanced by the combination of density peaks clustering method and adaptive parameter learning algorithm.

The rest of this paper is outlined as follows. The architecture of FNN is introduced in Section Ⅱ. Section Ⅲ details the proposed DP-AFNN, including the DP-based clustering method and the improved LM parameter learning algorithm. Subsequently, the convergence of DP-AFNN is discussed in Section Ⅳ. In Section Ⅴ, the proposed DP-AFNN is utilized to develop the nonlinear dynamic models of EC and EQ. Conclusion of the paper is provided in Section Ⅵ.

Ⅱ. FUZZY NEURAL NETWORK

For a multi-input and single-output (MISO) system, the nonlinear continuous time form of the dynamical system can be defined as

$ \begin{equation} \label{eq1} y(t)=f(x(t)) \end{equation} $ (1)

where $x( t)= [x_{1}(t), x_{2}(t), \dots, x_{n}(t)]^{T} $ and $ y(t) $ are the input and output of the nonlinear system at time $ t $, respectively, $ n $ is the number of input variables, and $ f( \cdot ) $ denotes the unknown nonlinear function of the system.

To model the MISO nonlinear system, a four-layer MISO FNN is introduced. The structure of the MISO FNN is shown in Fig. 1. The four layers are, respectively, the input layer, the membership function (MF) layer, the rule layer, and the output layer [15]. The mathematical description of this FNN is expressed as follows.

Download:
Fig. 1 Architecture of the FNN.

The Input Layer: there are $ n $ neurons in this layer, and each neuron represents an input linguistic variable. The output values of input layer are

$ \begin{equation} \label{eq2} u_{i} (t)=x_{i} (t), \quad i=1, 2, \dots, n \end{equation} $ (2)

where $ u_{i}(t) $ is the output value of the ith neuron at time $ t $, and $x(t)= [x_{1}(t), x_{2}(t), \dots, x_{n}(t)] ^{T} $ is the input vector at time $ t $.

The MF Layer: there are $ n\times r $ neurons in this layer, and each neuron represents a membership function which is in the form of Gaussian function. The output values of MF layer can be expressed as

$ \begin{equation} \label{eq3} \mu_{ij} (t)=e^{-\frac{(x_{i} (t)-c_{ij} (t))^{2}}{\sigma_{ij}^{2} (t)}}, i=1, 2, \dots, n; j=1, 2, \dots, r \end{equation} $ (3)

where $ \mu_{ij}(t) $, $ c_{ij}(t) $, and $ \sigma_{ij}(t) $ are the output value, center, and width of the jth Gaussian MF of $ x_{i}(t) $ at time $ t $, respectively.

The Rule Layer: there are $ r $ neurons in this layer, and each neuron represents an antecedent part for fuzzy rules. For the output of MF layer, the product and normalization operators are executed sequentially. The output values of rule layer denote the spatial firing strength of fuzzy rules. For the jth rule neuron, its output value is

$ \begin{eqnarray} v_{j} (t)=\prod\limits_{i=1}^n {\mu_{ij} (t)} =e^{-\sum\limits_{i=1}^n {\frac{(x_{i} (t)-c_{ij} (t))^{2}}{\sigma_{ij}^{2} (t)}} } \nonumber\\ i=1, 2, \dots, n; ~j=1, 2, \dots, r \end{eqnarray} $ (4)

and

$ \begin{eqnarray} h_{j} (t)=\frac{v_{j} (t)}{\sum\limits_{j=1}^r {v_{j} (t)} }=\frac{e^{-\sum\limits_{i=1}^n {\frac{(x_{i} (t)-c_{ij} (t))^{2}}{\sigma _{ij}^{2} (t)}} }}{\sum\limits_{j=1}^r {e^{-\sum\limits_{i=1}^n {\frac{(x_{i} (t)-c_{ij} (t))^{2}}{\sigma_{ij}^{2} (t)}} }} } \nonumber\\ i=1, 2, \dots, n; ~j=1, 2, \dots, r \end{eqnarray} $ (5)

where $ {\pmb h}(t)= [h_{1}(t), h_{2}(t), \dots, h_{r}(t)]^{T} $ is the normalized output vector of rule layer at time $ t $.

The Output Layer: the output value of this layer is the summation of incoming signals

$ \begin{eqnarray} y(t)&={\pmb w}(t){\pmb h}(t)=\sum\limits_{j=1}^r {w_{j} (t)h_{j} (t)} \nonumber\\ & =\frac{\sum\limits_{j=1}^r {w_{j} (t)e^{-\sum\limits_{i=1}^n {\frac{(x_{i} (t)-c_{ij} (t))^{2}}{\sigma _{ij}^{2} (t)}} }} }{\sum\limits_{j=1}^r {e^{-\sum\limits_{i=1}^n {\frac{(x_{i} (t)-c_{ij} (t))^{2}}{\sigma_{ij}^{2} (t)}} }} } \end{eqnarray} $ (6)

where $ {\pmb w}(t)= [w_{1}(t), w_{2}(t), \dots, w_{r}(t)] $ is the weight vector between the rule neuron and the output neuron at time $ t $.

Ⅲ. DP-AFNN METHOD

In the following, the principle of the DP-based clustering method is given firstly. Then, the adaptive learning algorithm is described in detail. Finally, the procedure of DP-AFNN modeling is provided.

A. Density Peaks Clustering Method

Recently, a powerful clustering approach based on the density peaks (DP) has been proposed by Rodriguez and Laio in [21]. In this study, the DP-based clustering method is applied to partition the input space and to obtain a suitable classification of the training samples. Then, the centers and variances of the clusters are utilized as the parameters of FNN. In the DP algorithm, there are two assumptions: 1) the cluster centers are surrounded by neighbors which have a lower local density; 2) the cluster centers have a relatively large distance from any point which has a higher local density. For the convenience of the searching, two indicators, including the local density $ \rho_{i} $ and the minimum distance $ \delta_{i} $, are defined in the DP algorithm.

In the DP algorithm, the local density $ \rho_{i} $ is given as

$ \begin{equation} \label{eq7} \rho_{i} =\sum\limits_j {\chi (d_{ij} -d_{c} )} \end{equation} $ (7)

where $ d_{ij} $ is the Euclidean distance between the data point $ i $ and $ j $, and $ d_{c} $ is a predefined cutoff distance. If $ x <0 $, then $ \chi (x)= 1$, otherwise $ \chi (x)= 0$. Generally, the indicator $ \rho_{i} $ can be obtained through finding data points whose distance to data point $ i $ is less than $ d_{c} $. Additionally, $ d_{c} $ can also be obtained by setting a percentage parameter $ \alpha $% of all the distance values $ d_{ij} $ in an ascending order. The recommended $ \alpha $% is 0.5% ~ 5%. For example, if the number of training samples is 400 and the percentage parameter $ \alpha $% is set to be 4%, $ d_{c} $ is equal to the sixteenth value of all the distances $ d_{ij} $ in an ascending order.

In the DP algorithm, $ \delta_{i} $ is defined as the minimum distance between the data point $ i $ and any other data point with higher local density

$ \begin{equation} \label{eq8} \delta_{i} =\min\limits_{j: \rho_{j} >\rho_{i} } (d_{ij} ). \end{equation} $ (8)

When the point $ i $ has the highest local density, its $ \delta_{i} $ is set to be the maximum distance between the data point $ i $ and other data points. Thus, the data points with large $ \rho_{i} $ and $ \delta_{i} $ values are chosen as cluster centers. To avoid setting two thresholds for $ \rho_{i} $ and $ \delta_{i} $ simultaneously, a scheme to select the cluster center is given by

$ \begin{equation} \label{eq9} \gamma_{i} =\rho_{i} \delta_{i}. \end{equation} $ (9)

In the clustering process, data point will be chosen as cluster center if its $ \gamma_{i} $ is larger than the pre-specified threshold. To avoid the setting of threshold, $ \gamma_{i} $ can be sorted by its value in a descending order and the centers with a fixed number $ N $ can be selected. Afterwards, other non-central points are assigned to the cluster which has the highest local density and the minimum distance to them.

B. Improved LM Algorithm

To avoid trapping into local optimal, the EBP algorithm which has been widely used in FNNs is replaced by an improved LM algorithm in this study. Thereby, the modeling accuracy can be enhanced with the usage of the improved LM parameter optimization algorithm. For the pth sample, the error between the desired output and network output of FNN can be described as

$ \begin{equation} \label{eq10} e^{p}(t)=y_{d}^{p} (t)-y^{p}(t), \quad p=1, 2, \dots, P \end{equation} $ (10)

where $ P $ is the total number of the samples.

Derived from LM algorithm [22], all the parameters in the FNN, i.e., centers, widths, and output weights are adjusted by the following formula

$ \begin{equation} \label{eq11} \boldsymbol{\Theta }(t+1)=\boldsymbol{\Theta }(t)-(\boldsymbol{\Psi }(t)+\eta (t)\times {\pmb I})^{-1}\times \boldsymbol{\Omega }(t) \end{equation} $ (11)

where $ \boldsymbol{\Theta}(t)= [w(t), {{c}}(t), \sigma (t)] $ is the parameter vector, $ \boldsymbol{\Psi} (t) $ is the quasi-Hessian matrix, $ \boldsymbol{\Omega} (t) $ is the gradient vector, ${\pmb{I}} $ is the identity matrix which is adopted to circumvent the difficulty of singularity in reversing the matrix. And the learning rate $ \eta (t) $ is given by

$ \begin{equation} \label{eq12} \eta (t)=2\left\| {\pmb{e}(t)} \right\|^{2} \end{equation} $ (12)

where $ \pmb{{e}}(t)= [e_{1}(t), e_{2}(t), \dots, e_{P}(t)]^{T} $ is the error vector. The adaptive learning rate can accelerate the learning speed of FNN. Furthermore, the $ \boldsymbol{\Psi} (t) $ and $\boldsymbol{\Omega} (t) $ are the accumulation of submatrices and subvectors for all samples, respectively.

$ \boldsymbol{\Psi }(t)=\sum\limits_{p=1}^P {\pmb{\psi }_{p} (t)} $ (13)
$ \boldsymbol{\Omega }(t)=\sum\limits_{p=1}^P {\pmb{\omega}_{p} (t)} $ (14)

where the submatrix $ \pmb{\psi}_{p}(t) $ and the subvector $\pmb{\omega}_{p}(t) $ are defined as

$ \pmb{\psi }_{p} (t)={\pmb j}_{p}^{T} (t){\pmb j}_{p} (t) $ (15)
$ \pmb{\omega }_{p} (t)={\pmb j}_{p}^{T} (t)e_{p} (t) $ (16)

where ${\pmb j}_{p}(t) $ is the row vector of Jacobian matrix

$ \begin{equation} \label{eq17} \begin{aligned} {\pmb j}_{p} (t)&=\left[{\frac{\partial e_{p} (t)}{\partial w_{1} (t)}, \cdots, \frac{\partial e_{p} (t)}{\partial w_{r} (t)}, \frac{\partial e_{p} (t)}{\partial c_{11} (t)}, \dots \frac{\partial e_{p} (t)}{\partial c_{ij} (t)}, \dots, } \right. \\ &\quad~ \left. \frac{\partial e_{p} (t)}{\partial c_{nr} (t)}, \frac{\partial e_{p} (t)}{\partial \sigma_{11} (t)}, \dots, \frac{\partial e_{p} (t)}{\partial \sigma_{ij} (t)}, \dots, \frac{\partial e_{p} (t)}{\partial \sigma_{nr} (t)} \right]\!\!. \end{aligned} \end{equation} $ (17)

According to the update rule of gradient descent learning approach, the elements of the Jacobian row vector are presented as

$ \frac{\partial e_{p} (t)}{\partial w_{j} (t)} =\frac{\partial e_{p} (t)}{\partial y_{p} (t)}\; \frac{\partial y_{p} (t)}{\partial w_{j} (t)}\; =-h_{j} (t) $ (18)
$ \begin{align} \label{eq19}\nonumber \frac{\partial e_{p} (t)}{\partial c_{ij} (t)} & =\frac{\partial e_{p} (t)}{\partial y_{p} (t)}\frac{\partial y_{p} (t)}{\partial h_{j} (t)}\frac{\partial h_{j} (t)}{\partial v_{j} (t)}\frac{\partial v_{j} (t)}{\partial \mu_{ij} (t)}\frac{\partial \mu_{ij} (t)}{\partial c_{ij} (t)} \\ & =-w_{j} (t)\frac{\sum\limits_{i\ne j}^r {v_{i} (t)}}{(\sum\limits_{i=1}^r {v_{i} (t)} )^{2}}\prod\limits_{k\ne i} {\mu_{kj} (t)} \frac{\partial \mu_{ij} (t)}{\partial c_{ij} (t)} \\ \end{align} $ (19)
$ \begin{align} \label{eq20}\nonumber \frac{\partial e_{p} (t)}{\partial \sigma_{ij} (t)} & =\frac{\partial e_{p} (t)}{\partial y_{p} (t)}\frac{\partial y_{p} (t)}{\partial h_{j} (t)}\frac{\partial h_{j} (t)}{\partial v_{j} (t)}\frac{\partial v_{j} (t)}{\partial \mu_{ij} (t)}\frac{\partial \mu_{ij} (t)}{\partial \sigma _{ij} (t)} \\ & =-w_{j} (t)\frac{\sum\limits_{i\ne j}^r {v_{i} (t)}}{(\sum\limits_{i=1}^r {v_{i} (t)} )^{2}}\prod\limits_{k\ne i} {\mu_{kj} (t)} \frac{\partial \mu_{ij} (t)}{\partial \sigma_{ij} (t)} \end{align} $ (20)

where

$ \begin{align*} \frac{\partial \mu_{ij} (t)}{\partial c_{ij} (t)}& =\frac{2(x_{i} (t)-c_{ij} (t))\exp (-{(x_{i} (t)-c_{ij} (t))^{2}}/{\sigma_{ij}^{2} (t)})}{\sigma _{ij}^{2} (t)} \\ \frac{\partial \mu_{ij} (t)}{\partial \sigma_{ij} (t)}& =\frac{2(x_{i} (t)-c_{ij} (t))^{2}\exp (-{(x_{i} (t)-c_{ij} (t))^{2}}/{\sigma_{ij}^{2} }(t))}{\sigma_{ij}^{3} (t)}. \end{align*} $

The proposed improved adaptive LM algorithm is a second-order parameter estimation algorithm. While computing the submatrix $ \pmb{\psi} _{p}(t) $ and subvector $ \pmb{\omega}_{p}(t) $ for each pattern $ p $, only $(2 n+ 1) \times r $ elements of $ {\pmb j}_{p}(t) $ are needed. It should be noted that there is no need for storing and multiplying Jacobian matrix in the computation of quasi-Hessian matrix $ \boldsymbol{\Psi} (t) $ and gradient vector $ \boldsymbol{\Omega} (t) $. Thus, the memory cost and computational complexity can be significantly reduced. Furthermore, the improved LM algorithm with the adaptive learning rate can not only accelerate the learning speed, but also enhance the generalization capability.

C. Procedure of DP-AFNN Modeling

To reveal the proposed modeling method clearly, the design steps of the models of EC and EQ with the usage of DP-AFNN are summarized as follows.

Step 1: Generate enough process data about EC and EQ using BSM1 and execute data normalization operation.

Step 2: Calculate the local density $ \rho $ for each data point according to the predefined percentage parameter $ \alpha$%.

Step 3: Calculate the minimum distance $ \delta $ for each data point.

Step 4: Compute the cluster center selection parameter $\gamma $ for each data point, sort all data points by their $ \gamma$ values in a descending order, and select the first $ N $ data points as cluster centers.

Step 5: Create two initial four-layer FNN. The number of neurons in the input layer is the same as the number of input variables. The initial centers of fuzzy rules are the centers of clusters. The initial widths of all fuzzy rules are set to be the variances of clusters. The initial output weights of DP-AFNN are set to be the outputs of training samples which are selected as the centers of fuzzy rules. The maximum number of iterations is set to be $ T $.

Step 6: Calculate the network output $ y(t) $ for each training sample.

Step 7: Make the error $ e(t) $, submatrix $ \pmb{\psi} (t) $, and subvector $ \pmb{\omega} (t) $ for each training sample.

Step 8: Make the accumulation of submatrices $ \boldsymbol{\Psi}(t) $ and subvectors $ \boldsymbol{\Omega} (t) $ for all the training samples.

Step 9: Update the learning rate $ \eta (t) $ and the parameter vector $ \boldsymbol{\Theta} (t) $.

Step 10: Stop and evaluate the test samples if the stopping criteria is satisfied. Otherwise, go to Step 6.

Ⅳ. CONVERGENCE ANALYSIS

The convergence of the proposed DP-AFNN is of great importance for modeling EC and EQ. In this section, our analysis suggests that the convergence of DP-AFNN can be guaranteed.

We choose the Lyapunov function as follows

$ \begin{equation} \label{eq21} V(\boldsymbol{\Theta }(t))=\frac{1}{2}\pmb{e}^{T}(t)\pmb{e}(t) \end{equation} $ (21)

where $ \pmb{{e}}(t)= [e_{1}(t), e_{2}(t), \dots, e_{p}(t)]^{T} $ and $ V ( \boldsymbol{\Theta} (t))> 0 $. The difference between the Lyapunov function in the next time step and in the present time step is determined as follows

$ \begin{eqnarray} \Delta V(\boldsymbol{\Theta }(t))&\hspace{-0.2cm}=&\hspace{-0.2cm}V(\boldsymbol{\Theta }(t+1))-V(\boldsymbol{\Theta}(t)) \nonumber\\[2mm] &&\hspace{-0.6cm}=\frac{1}{2}e^{T}(t+1)e(t+1)-\frac{1}{2}e^{T}(t)e(t) \nonumber\\[2mm] &&\hspace{-0.6cm}=-\nabla E^{T}(\boldsymbol{\Theta }(t))\Delta \boldsymbol{\Theta }(t)\nonumber\\[2mm] &&\hspace{-0.6cm}\quad~ +\frac{1}{2}\Delta \boldsymbol{\Theta }^{T}(t)\nabla ^{2}E(\boldsymbol{\Theta }(t))\Delta \boldsymbol{\Theta }(t) \end{eqnarray} $ (22)

where $ \nabla {\pmb E}( \boldsymbol{\Theta} (t)) $ and $\nabla ^{2} {\pmb E}(\boldsymbol{\Theta} (t)) $ are the gradient vector and the matrix of second derivative (or Hessian matrix) of the cost function, respectively. In [22] and [23], when the parameters of DP-AFNN are updated by (11), if

$ \begin{equation} \label{eq23} \left\| {\Delta {\boldsymbol{\Theta }}(t)} \right\|\leq \min \left\{ {\left\| {\Delta {\boldsymbol{\Theta }}(t-1)} \right\|, \frac{\left\| {\boldsymbol{\Omega }({\boldsymbol{\Theta }}(t-1))} \right\|}{\left\| {\boldsymbol{\Psi }({\boldsymbol{\Theta }}(t-1))} \right\|}} \right\}. \end{equation} $ (23)

There will be

$ \begin{equation} \label{eq24} \begin{cases} \Delta {\boldsymbol{\Theta }}(t)=(\boldsymbol{\Psi }({\boldsymbol{\Theta }}(t))+\eta (t)\times {{\pmb I}})^{-1}\times \boldsymbol{\Omega }({\boldsymbol{\Theta}}(t)) \\[3mm] \nabla {{\pmb E}}({\boldsymbol{\Theta }}(t))=\boldsymbol{\Omega }({\boldsymbol{\Theta }}(t)) \\[3mm] \nabla^{2}{{\pmb E}}({\boldsymbol{\Theta }}(t))=\boldsymbol{\Psi }({\boldsymbol{\Theta }}(t))+\eta (t)\times {{\pmb I}}. \end{cases} \end{equation} $ (24)

It follows from (23) and (24) that $ \Delta V(\boldsymbol{\Theta} (t)) $ can be expressed as

$ \begin{equation} \label{eq25} \Delta V(\boldsymbol{\Theta }(t))=-\frac{1}{2}\Delta \boldsymbol{\Theta}^{T}(t)\nabla^{2}E(\boldsymbol{\Theta }(t))\Delta \boldsymbol{\Theta }(t). \end{equation} $ (25)

When the condition (21) is satisfied, the matrix $ \nabla^{2}{\pmb E}(\boldsymbol{\Theta }(t)) $ is positive definite, then we can conclude that $ \Delta V(\boldsymbol{\Theta} (t)) < 0 $. Therefore,

$ \begin{equation} \label{eq26} \lim\limits_{t\to +\infty } \pmb{e}(t)=\boldsymbol{0}. \end{equation} $ (26)

According to the Lyapunov stability theorem, we know that the DP-AFNN is convergent.

Ⅴ. SIMULATION STUDIES A. Wastewater Treatment Process

The most popular technology of wastewater treatment is the activated sludge process (ASP). To evaluate the performance of different control strategies, a benchmark simulation platform based on ASP has been developed with the framework of COST Actions 682 and 624-BSM1 [24].

In the BSM1, EC is the sum of aeration energy (AE) and pumping energy (PE). AE and PE are calculated as

$ AE=\frac{S_\rm{O, sat} }{T\times 1.8\times 1000}\int_{kT}^{(k+1)T} {\sum\limits_{i=1}^5 {V_{i} \cdot K_{Lai} (t)dt} } $ (27)
$ PE=\frac{1}{T\times 1000}\int_{kT}^{(k+1)T} {(4Q_{a} (t)+8Q_{r} (t)+50Q_{w} (t))dt} $ (28)

where $ K_{Lai} $ and $ V_{i} $ are the oxygen transfer rate and volume of the ith biological reactor, respectively. $S_\rm{O, sat} $ is the saturation concentration of dissolved oxygen and $ T $ is the optimal cycle. $ Q_{a} $, $ Q_{r} $, and $ Q_{w} $ are the internal recycle flow rate, external recycle flow rate, and waste sludge flow rate, respectively.

Moreover, EQ is defined as

$ \begin{align} \label{eq29}\nonumber &EQ=\frac{1}{T\cdot 1000} \\ &\times\!\!\int_{kT}^{(k+1)T} \begin{pmatrix} 2\cdot SS_{e} (t)\!+\!COD_{e} (t)\!+\!3\cdot S_{NKj, e} (t) \\ +10\cdot S_\rm{NO, e} (t)\!+\!2\cdot BOD_{5, e} (t) \\ \end{pmatrix} Q_{e} (t)dt \end{align} $ (29)

where $SS _{e} $ is the suspended solid concentration, $ COD_{e} $ is the chemical oxygen demand, $ S_{NKj, e} $ is the Kjeldahl nitrogen concentration, $ S_\rm{NO, e} $ is the nitrate and nitrite nitrogen concentration, $ BOD_{5, e} $ is the biochemical oxygen demand of 5 days, and $ Q_{e} $ is the effluent flow rate.

In the BSM1, there are two control loops. One is the control loop of the dissolved oxygen concentration $ ( S_\rm{O, 5}) $ in the fifth reactor and another is the control loop of the nitrate level $ (S_\rm{NO, 2}) $ in the second reactor. The manipulated variables for $ S_\rm{O, 5} $ and $ S_\rm{NO, 2} $ are $ K_{La5} $ and $Q_{a} $, respectively. Through the analysis of the reaction mechanism and optimal control, $ Q_{w} $, $ K_{La3} $, $ K_{La4} $, $ S_\rm{O, 5} $, and $ S_\rm{NO, 2} $ are selected as the decision variables. The aim of multiobjective optimal control in the WWTP is to acquire the balance between EC and EQ with the usage of the best set points of decision variables found by the optimization algorithm. Thus, any attempt to develop the models of EC and EQ is not trivial. In this study, the proposed DP-AFNN method is adopted to establish the relationship between EC, EQ and decision variables. The schematic diagram of the modeling of EC and EQ is shown in Fig. 2. The ranges of decision variables are listed in Table Ⅰ. And Fig. 3 demonstrates the changes of decision variables in an optimal cycle.

Download:
Fig. 2 Schematic diagram of the modeling of EC and EQ.
Table Ⅰ
THE RANGE OF DECISION VARIABLES
Download:
Fig. 3 The values of input variables in the predicting process.
B. Experimental Settings

In this section, the effectiveness of the proposed DP-AFNN data-driven modeling method is demonstrated by using the BSM1 platform. A total of 500 samples are collected from BSM1. The preceding 400 samples are chosen as the training set, and the remaining 100 samples are applied as the testing set. Through the practical test, the parameters of DP-AFNN are set as follows: $\alpha $% = 2.5%, $ N= 10 $, $ T= 500$. Regarding the number of input and output variables, two prediction models with the initial network structure of 5-50-10-1 are constructed to predict EC and EQ, respectively. In the EBP algorithm, the learning rate $ \eta$ is set to be 0.01.

To assess the performance of the proposed DP-AFNN, the root mean-square-error (RMSE) and the predicting accuracy are adopted as two performance criteria in the experiments. For comparative analysis, the obtained results of DP-AFNN are compared with some existing methods, including model-based and FNN-based methods. All the results are averaged on 30 independent runs. RMSE and Accuracy are defined as

$ RMSE=\sqrt{\frac{1}{P}\sum\limits_{p=1}^P {(y_{d}^{p} -y^{p})^{2}} } $ (30)
$ Accuracy=\frac{1}{P}\sum\limits_{p=1}^P {\Big(1-\frac{\left| {y_{d}^{p} -y^{p}} \right|}{\left| {y_{d}^{p} } \right|}\Big)} \times 100 \%. $ (31)
C. Experimental Results 1). Energy Consumption Modeling

For the EC modeling, Fig. 4 shows the plot of $ \gamma $ sorted in a descending order. From Fig. 4, ten data points with high $ \gamma$ are selected as clustering centers. Fig. 5 presents the variation of RMSE values in the training process. From Fig. 5, the RMSE values of DP-AFNN and FNN-EBP can converge to a steady state, but the convergence speed of DP-AFNN is obviously faster than FNN-EBP. One may notice that the integration of the DP-based structure identification and adaptive parameter learning algorithm can achieve better solutions. The predicting results of the EC values are shown in Fig. 6, and the graphs of predicting errors between the desired and predicted outputs of these two algorithms are displayed in Fig. 7. From Fig. 7, the predicting errors obtained by the proposed DP-AFNN model are smaller than that of FNN-EBP.

Download:
Fig. 4 The value of $ \gamma $ in decreasing order.
Download:
Fig. 5 RMSE values during training.
Download:
Fig. 6 The predicting values of EC.
Download:
Fig. 7 The predicting error of EC.

The performance of the proposed DP-AFNN for EC modeling is compared with the FNN-EBP, BSM-MBR [25], TCIM [26], MLR [27], ARKF [10], and DFNN [15]. The details of performance comparison are given in Table Ⅱ. As observed in Table Ⅱ, the proposed DP-AFNN achieves the smallest testing RMSE (0.97) and the best testing accuracy (99.39%). The performance of DP-AFNN is superior to that of FNN-EBP and other mechanism methods. Although the MLR gets the least training time (0.03 s), its testing accuracy (95.17%) is lower than that of DP-AFNN. The comparisons indicate that the proposed DP-AFNN algorithm can be utilized to address the EC modeling problem in the WWTP.

Table Ⅱ
PERFORMANCE MEASURE VALUES OBTAINED BY DIFFERENT ALGORITHMS FOR EC
2). Effluent Quality Modeling

For the EQ modeling, the variation of RMSE values in the training process is presented in Fig. 8. Based on the results shown in Fig. 8, the proposed DP-AFNN method achieves faster convergence speed than that of FNN-EBP. The predicting results of the EQ values are shown in Fig. 9, and the graphs of the predicting errors between the desired and predicted outputs are displayed in Fig. 10. The predicting values demonstrate that the proposed DP-AFNN has high modeling precision.

Download:
Fig. 8 RMSE values during training.
Download:
Fig. 9 The predicting values of EQ.
Download:
Fig. 10 The predicting error of EQ.

The performance of the DP-AFNN for EQ modeling is compared with the FNN-EBP, MLR [27], and DFNN [15]. The detailed results of different algorithms are exhibited in Table Ⅲ. The testing RMSE and accuracy of the proposed DP-AFNN model approach to 0.96 and 93.04%, respectively, which is remarkable in comparison with other results. The experimental results in Table Ⅲ demonstrate that the proposed DP-AFNN has high generalization performance.

Table Ⅲ
PERFORMANCE MEASURE VALUES OBTAINED BY DIFFERENT ALGORITHMS FOR EQ
Ⅵ. CONCLUSION

In this paper, the DP-AFNN approach for designing the models of EC and EQ in the WWTP has been presented. The proposed method is technically developed with the usage of the DP-based clustering algorithm and the improved LM parameter learning algorithm. Based on the results obtained from the modeling of EC and EQ, the main contributions of this paper include: 1) Since the DP-based clustering algorithm can partition the input space well, the initial fuzzy rules of the proposed DP-AFNN are suitable for all the training samples. The simulation results indicate that the DP-based fuzzy rules extracting method can also reduce the computational complexity and enhance generalization ability. 2) Since the parameters are optimized by using the improved LM algorithm, the convergence speed and learning ability of the proposed DP-AFNN are superior to that of the FNN-EBP. 3) The experimental results indicate that the proposed DP-AFNN achieves better modeling accuracy than the mechanism models (i.e., BSM-MBR and TCIM) and other FNNs (i.e., FNN-EBP and DFNN). 4) The experimental results also demonstrate that the EC and EQ models in the WWTP can be developed with acceptable accuracy utilizing $ Q_{w} $, $ K_{La3} $, $K_{La4} $, $ S_\rm{O, 5} $, and $ S_\rm{NO, 2} $ as decision variables.

This work, which has solved the modeling problems of EC and EQ, plays an extremely essential role in the subsequent study of the multiobjective optimal control in the WWTP. The following work of our team includes two aspects. One is to deal with the complicated multiobjective optimization problem with the usage of the multiobjective evolutionary algorithm. The traditional multiobjective evolutionary algorithm should be improved so that high-quality Pareto optimal solutions can be found. The other is to design two tracking controllers for the optimal set-points of $S_\rm{O, 5} $ and $ S_\rm{NO, 2} $. The built-in PID closed-loop controllers of BSM should be improved in order to achieve higher tracking accuracy and better control placidity.

REFERENCES
[1] J. F. Wan, J. Gu, Q. Zhao, and Y. Liu, "Cod capture: a feasible option towards energy self-sufficient domestic wastewater treatment, " Sci. Rep. , vol. 6, pp. Article No. 25054, Apr. 2016.
[2] D. Mamais, C. Noutsopoulos, A. Dimopoulou, A. Stasinakis, and T. D. Lekkas, "Wastewater treatment process impact on energy savings and greenhouse gas emissions, " Water Sci. Technol. , vol. 71, no. 2, pp. 303-308, Jan. 2015. http://www.ncbi.nlm.nih.gov/pubmed/25633956
[3] F. Hernández Sancho, M. Molinos-Senante, and R. Sala-Garrido, "Energy efficiency in Spanish wastewater treatment plants: a non-radial DEA approach, " Sci. Total Environ. , vol. 409, no. 14, pp. 2693-2699, Jun. 2011. http://europepmc.org/abstract/MED/21549411
[4] Y. Yang, J. K. Yang, J. L. Zuo, Y. Li, S. He, X. Yang, and K. Zhang, "Study on two operating conditions of a full-scale oxidation ditch for optimization of energy consumption and effluent quality by using CFD model, " Water Res. , vol. 45, no. 11, pp. 3439-3452, May 2011. http://www.ncbi.nlm.nih.gov/pubmed/21529877
[5] I. Vera, K. Sáez, and G. Vidal, "Performance of 14 full-scale sewage treatment plants: comparison between four aerobic technologies regarding effluent quality, sludge production and energy consumption, " Environ. Technol. , vol. 34, no. 15, pp. 2267-2275, Jan. 2013. http://europepmc.org/abstract/med/24350481
[6] D. J. Dürrenmatt and W. Gujer, "Data-driven modeling approaches to support wastewater treatment plant operation, " Environ. Model. Softw. , vol. 30, pp. 47-56, Apr. 2012. http://europepmc.org/abstract/med/2673599
[7] R. Noori, H. D. Yeh, M. Abbasi, F. T. Kachoosangi, and S. Moazami, "Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand, " J. Hydrol. , vol. 527, pp. 833-843, Aug. 2015. http://www.sciencedirect.com/science/article/pii/S0022169415003960
[8] J. F. Qiao, W. Li, and H. G. Han, "Soft computing of biochemical oxygen demand using an improved T-S fuzzy neural network, " Chin. J. Chem. Eng. , vol. 22, no. 11-12, pp. 1254-1259, Nov. 2014. http://www.sciencedirect.com/science/article/pii/S100495411400130X
[9] A. Kusiak, Y. H. Zeng, and Z. J. Zhang, "Modeling and analysis of pumps in a wastewater treatment plant: A data-mining approach, " Eng. Appl. Artif. Intell. , vol. 26, no. 7, pp. 1643-1651, Aug. 2013. http://www.sciencedirect.com/science/article/pii/S0952197613000572
[10] H. G. Han, L. Zhang, and J. F. Qiao, "An energy consumption model of wastewater treatment process based on adaptive regressive kernel function, " CIESC J. , vol. 67, no. 3, pp. 947-953, Mar. 2016.
[11] M. Pratama, M. J. Er, X. Li, R. J. Oentaryo, E. Lughofer, and I. Arifin, "Data driven modeling based on dynamic parsimonious fuzzy neural network, " Neurocomputing, vol. 110, pp. 18-28, Jun. 2013. http://www.sciencedirect.com/science/article/pii/S0925231212008934
[12] R. J. Hathaway and Y. K. Hu, "Density-weighted fuzzy c-means clustering, " IEEE Trans. Fuzzy Syst. , vol. 17, no. 1, pp. 243-252, Feb. 2009.
[13] L. Teslic, B. Hartmann, O. Nelles, and I. Skrjanc, "Nonlinear system identification by gustafson-kessel fuzzy clustering and supervised local model network learning for the drug absorption spectra process, " IEEE Trans. Neural Netw. , vol. 22, no. 12, pp. 1941-1951, Dec. 2011. http://ieeexplore.ieee.org/document/6060917/
[14] H. Malek, M. M. Ebadzadeh, and M. Rahmati, "Three new fuzzy neural networks learning algorithms based on clustering, training error and genetic algorithm, " Appl. Intell. , vol. 37, no. 2, pp. 280-289, Sep. 2012.
[15] S. Q. Wu and M. J. Er, "Dynamic fuzzy neural networks -a novel approach to function approximation, " IEEE Trans. Syst. Man Cybern. B (Cybern. ), vol. 30, no. 2, pp. 358-364, Apr. 2000.
[16] H. G. Han, X. L. Wu, and J. F. Qiao, "Nonlinear systems modeling based on self-organizing fuzzy-neural-network with adaptive computation algorithm, " IEEE Trans. Cybern. , vol. 44, no. 4, pp. 554-564, Apr. 2014. http://europepmc.org/abstract/med/23782841
[17] B. M. Wilamowski and H. Yu, "Neural network learning without backpropagation, " IEEE Trans. Neural Netw. , vol. 21, no. 11, pp. 1793-1803, Nov. 2010.
[18] M. Davanipoor, M. Zekri, and F. Sheikholeslam, "Fuzzy wavelet neural network with an accelerated hybrid learning algorithm, " IEEE Trans. Fuzzy Syst. , vol. 20, no. 3, pp. 463-470, Jun. 2012. http://ieeexplore.ieee.org/document/6081924/
[19] W. Q. Zhao, K. Li, and G. W. Irwin, "A new gradient descent approach for local learning of fuzzy neural models, " IEEE Trans. Fuzzy Syst. , vol. 21, no. 1, pp. 30-44, Feb. 2013. http://ieeexplore.ieee.org/document/6203571/
[20] G. Leng, T. M. McGinnity, and G. Prasad, "Design for self-organizing fuzzy neural networks based on genetic algorithms, " IEEE Trans. Fuzzy Syst. , vol. 14, no. 6, pp. 755-766, Dec. 2006. http://ieeexplore.ieee.org/document/4016084/
[21] A. Rodriguez and A. Laio, "Clustering by fast search and find of density peaks, " Science, vol. 344, no. 6191, pp. 1492-1496, Jun. 2014. http://www.ncbi.nlm.nih.gov/pubmed/24970081
[22] B. M. Wilamowski and H. Yu, "Improved computation for levenbergmarquardt training, " IEEE Trans. Neural Netw. , vol. 21, no. 6, pp. 930-937, Jun. 2010. http://europepmc.org/abstract/MED/20409991
[23] H. G. Han, L. M. Ge, and J. F. Qiao, "An adaptive second order fuzzy neural network for nonlinear system modeling, " Neurocomputing, vol. 214, pp. 837-847, Nov. 2016. http://www.sciencedirect.com/science/article/pii/S0925231216307329
[24] J. F. Qiao and W. Zhang, "Dynamic multi-objective optimization control for wastewater treatment process, " Neural Comput. Appl. , vol. 29, no. 11, pp. 1261-1271, Jun. 2018.
[25] T. Maere, B. Verrecht, S. Moerenhout, S. Judd, and I. Nopens, "BSMMBR: a benchmark simulation model to compare control and operational strategies for membrane bioreactors, " Water Res. , vol. 45, no. 6, pp. 2181-2190, Mar. 2011. http://www.ncbi.nlm.nih.gov/pubmed/21329957
[26] P. Vanrolleghem and S. Gillot, "Robustness and economic measures as control benchmark performance criteria, " Water Sci. Technol. , vol. 45, no. 4-5, pp. 117-126, Feb. 2002. http://www.ncbi.nlm.nih.gov/pubmed/11936624
[27] E. Lee, S. Han, and H. Kim, "Development of software sensors for determining total phosphorus and total nitrogen in waters, " Int. J. Environ. Res. Public Health, vol. 10, no. 1, pp. 219-236, Jan. 2013. http://europepmc.org/articles/PMC3564139/