Loading [MathJax]/jax/element/mml/optable/GeneralPunctuation.js
  中国科学院大学学报  2025, Vol. 42 Issue (1): 20-25   PDF    
Learning the parameters of a class of stochastic Lotka-Volterra systems with neural networks
WANG Zhanpeng, WANG Lijin     
School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
Abstract: In this paper, we propose a neural network approach to learn the parameters of a class of stochastic Lotka-Volterra systems. Approximations of the mean and covariance matrix of the observational variables are obtained from the Euler-Maruyama discretization of the underlying stochastic differential equations (SDEs), based on which the loss function is built. The stochastic gradient descent method is applied in the neural network training. Numerical experiments demonstrate the effectiveness of our method.
Keywords: stochastic Lotka-Volterra systems    neural networks    Euler-Maruyama scheme    parameter estimation    
一类随机Lotka-Volterra系统参数的神经网络学习
汪展鹏, 王丽瑾     
中国科学院大学数学科学学院, 北京 100049
摘要: 提出一种利用神经网络学习一类随机Lotka-Volterra系统参数的方法。利用随机微分方程的Euler-Maruyama离散近似推导出观测变量的期望和协方差矩阵, 并在此基础上建立损失函数。在网络训练过程中, 采用随机梯度下降方法进行优化。数值实验验证了该算法的有效性。
关键词: 随机Lotka-Volterra系统    神经网络    Euler-Maruyama格式    参数估计    

The classical Lotka-Volterra (L-V) systems are differential equations modeling competitions among multiple species. Research on these systems has been extensive in both theoretical and numerical aspects. Because of uncertain nature of the surrounding environments, L-V systems are subject to random perturbations, and these systems become stochastic as a result. The stochastic L-V system has attracted particular attention in recent years, and studies with interesting results have constantly been emerging. (see Refs.[1-2] and references therein). Related work includes theoretical analysis, numerical simulations, as well as model (parameter) estimations.

With the rapid development of big data science and artificial intelligence[3], machine learning methods are gradually penetrating into more and more research fields (see Refs. [4-11]), such as the regression of differential equations from observational data using neural networks.

The regression of differential equations can be classified into two categories: deterministic and stochastic. For deterministic ordinary differential equations (ODEs), Raissi et al.[4] proposed a multi-step network based on the multi-step methods to learn the equations satisfied by the data points. Chen et al.[5] proposed the neural ODE method, which established the so-called adjoint equations to remove the memory-expensive backpropagating through numerical steps of the ODE integrator. Gholami et al.[6] gave the ANODE method, where an adjoint based neural ODE framework was designed to avoid problems caused by numerical instability, and unconditionally accurate gradients were provided. Kloppers et al.[12] discussed the integral and log integral methods for estimating parameters of the L-V models. Wu et al.[13-14] proposed a linear programming parameter estimation approach according to the grey direct modeling method given. Xu et al.[15-16] applied the regularization method to inverting parameters, and the inverse problem of parameter estimation from one solution of the system was studied by Shakurov et al.[17] who also gave the existence and uniqueness conditions. Waniewski et al.[18] estimated parameters via individual based computer simulations of L-V systems, while Luus[19] applied direct search optimization method to parameter estimations of L-V systems.

For learning stochastic differential equations (SDEs), Urain et al.[7] proved the Lyapunov stability of a class of SDEs, and proposed the ImitationFlow which extends the normalizing flows approach to learn stable SDEs from a set of observed trajectories. Kong et al.[8] gave a neural SDE model (SDE-Net), which consists of a diffusion network and a drift network, learning the diffusion and the drift coefficients respectively. Yildiz et al.[9] introduced a novel paradigm for learning non-parametric drift and diffusion coefficients of SDEs. The proposed model learns to simulate path distributions that match observations with non-uniform time increments and arbitrary sparseness, which is in contrast with gradient matching that does not optimize simulated responses. Bento et al.[10] considered linear models for SDEs. We remark that any such model can be associated with a network (i.e. a directed graph) describing which degrees of freedom interact under the dynamics, and tackling the problem of learning such a network from observation of the system trajectory. Tzen et al.[11] established a connection between infinitely deep residual networks and solutions to SDEs.

The goal of this paper is to estimate learnable parameters of a class of stochastic L-V systems from given data using neural networks. Given observational data from a certain population dynamical system, we first derive the statistics, i.e., the approximate mean and covariance matrix of the population variable based on the Euler-Maruyama discretization scheme. We then design a loss function using the statistics of the data based on the Euler-Maruyama scheme. Finally, we apply the neural network with stochastic gradient descent (SGD) to learn the parameters of the underlying stochastic L-V equations.

1 The neural network algorithm

The underlying stochastic L-V systems that we attempt to learn are of the form[1]

{dy(t)=D(y(t))[(bAy(t))dt+σdW(t)],y(t0)=y0, (1)

where y(t)=(y1, …, yn)T, and D(y)=diag{y1, …, yn}) is a diagonal matrix. n represents the number of species in the system, and yi is the population size of the species i. Elements of the vectors b=(b1, …, bn)T and σ=(σ1, …, σn)T, as well as those of the symmetric matrix A=(aij)n×n are all positive learnable para-meters. W(t) is a 1-dimensional standard Wiener process. For i, j=1, …, n, aii denotes the coefficient of intraspecific competition in the species i, aij measures the interaction between species i and j, bi is the intrinsic growth rate of the species i, and σi represents the intensity of the noise on the species i.

The Euler-Maruyama scheme[20] applied to system (1) reads:

y1=y0+D(y0)[(bAy0)h+σhξ], (2)

where h is the time step size and ξ~N(0, 1).

Denote y(t0+h) by ˆy1. Suppose we are given the observation data {(y(m)0,ˆy(m,k)1), m=1, …, N0, k=1, …, N1}. Based on the Euler-Maruyama discretization (2), we can approximately regard ˆy(m,k)1 as being obtained by

ˆy(m,k)1y(m,k)1=y(m)0+D(y(m)0)[(bAy(m)0)h+σhξ(m,k)], (3)

where {ξ(m, k), m=1, …, N0, k=1, …, N1} are independent N(0, 1) random variables.

According to (3), for each m∈{1, …, N0}, ˆy(m)1 approximately obeys the normal distribution with mean μ(m)=y(m)0+D(y(m)0)(bAy(m)0)h and covariance matrix Σ(m)=hD(ym0)σ(D(ym0)σ)T. The law of large numbers[21] implies that, for each m∈{1, …, N0}the mean and covariance matrix of {ˆy(m,k)1,k=1,,N1} approaches μ(m) and Σ(m) respectively as the number of samples N1 increases, namely, for sufficiently large N1

μ(m)¯ˆy(m)1=N1k=1ˆy(m,k)1N1,Σ(m)Σ(m)ˆy1=N1k=1ˆy(m,k)1(ˆy(m,k)1)TN1N1k=1ˆy(m,k)1N1(N1k=1ˆy(m,k)1)TN1. (4)

Thus, for our neural work, we set up the training data {(y(m)0,¯ˆy(m)1,Σ(m)ˆy1),m=1,,N0}.

Due to symmetry of the matrix A, there exists a function ˆf(y)=bTy12yTAy such that bAy=ˆf(y). Meanwhile, it holds that σ=ˆg(y) for ˆg(y)=ni=1σiyi.

Then we construct the loss function L of our network as follows

Loss1= (5)

where the norm ‖·‖2 denotes the Euclidean norm.

The functions \hat{f} and \hat{g} can then be learned by the net through performing SGD on L. Consequently, we get the estimations

\begin{gathered} \boldsymbol{A} \approx-\nabla^2 \hat{f}, \\ \boldsymbol{b} \approx \nabla \hat{f}\left(\boldsymbol{y}_0\right)-\nabla^2 \hat{f}\left(\boldsymbol{y}_0\right), \quad \forall \boldsymbol{y}_0, \\ \boldsymbol{\sigma} \approx \nabla \hat{g} . \end{gathered} (6)

We write the above process into Algorithm 1.

Algorithm 1: Learning process of stochastic L-V systems
 Input: Initial point set \left\{\boldsymbol{y}_0^{(m)}, m=1, \cdots, N_0\right\}
 Output: The estimate of A, b, σ
1 Getting the observed value {\hat{\boldsymbol{y}}_1^{(m, k)}, m=1, \cdots, N_0, k=1, \cdots, N_1};
2 For m=1: N0 do
3   Calculating \overline{\hat{y}_1^{(m)}} and \Sigma_{g_1}^{(m)} according to formula (4);
4 end
5 \boldsymbol{D} \leftarrow\left\{\left(\boldsymbol{y}_0^{(m)}, \overline{\hat{\boldsymbol{y}}_1^{(m)}}, \Sigma_{\hat{\boldsymbol{y}}_1}^{(m)}\right), m=1, \cdots, N_0\right\};
6 Training netNN(·)with data set D;
7 \hat{f}←netNN[1];
8 \hat{g}←netNN[2];
9 Calculating estimates according to equation (6).

2 Implementation

In this section, we apply our neural network algorithm to the following stochastic L-V system with 2 species[1], to illustrate the implementation of the algorithm and tesitify its ability of parameter estimation

\left\{\begin{array}{l} \mathrm{d} x(t)=x(t)\left[\left(b_1-a_{11} x(t)-a_{12} y(t)\right) \mathrm{d} t+\right. \\ \;\;\;\;\left.\sigma_1 \mathrm{~d} W(t)\right], \\ \mathrm{d} y(t)=y(t)\left[\left(b_2-a_{21} x(t)-a_{22} y(t)\right) \mathrm{d} t+\right. \\ \;\;\;\;\left.\sigma_2 \mathrm{~d} W(t)\right], \end{array}\right. (7)

Where b1=1, b2=1.2, a11=0.3, a12=0.2, a21=0.2, a22=0.4, σ1=0.4, and σ2=0.5.

The experiment in this paper is performed in the Windows system, using the PyCharm development environment, and Python 3.7 as the assembly language. In Algorithm 1, we use Pytorch library[22] to build a feedforward neural network framework netNN(·) as shown in Fig. 1, where there is one input layer with 2 neurons, one hidden layer with 400 neurons, one output layer with 2 neurons learning \hat{f} and \hat{g} respectively, and the activation functions are all tanh. We set the proportion of the training set to the total data as Psplit=0.8, the learning rate as rlearn=0.001, the sample sizes as N0=1 000, N1=1 000, and the number of trainings as Ntrain=2 000.

Download:


Fig. 1 Feedforward neural network structure

Hereby we denote (x, y)T=: y. Given the observation data {\left(\boldsymbol{y}_0^{(m)}, \hat{\boldsymbol{y}}_1^{(m, k)}\right), m=1, \cdots, N_0, k=1, \cdots, N_0}, we get the training data {\left(\boldsymbol{y}_0^{(m)}, \overline{\hat{\boldsymbol{y}}_1^{(m)}}, \right.\left.\boldsymbol{\Sigma}_{y_1}^{(m)}\right), m=1, \cdots, N_0} by using (4). Then we train the network with the loss function (5) to obtain the outputs \hat{f} and \hat{g}. The parameters of the system can be estimated afterwards via (6).

For numerical validation, we generate the observation data in the following way. First we select the data region Ωd=[0, π]×[0, π] and sample uniformly in it to obtain {y0(m), m=1, …, N0}. Then for each m, we apply the Milstein scheme[20]

\begin{gathered} x_{l+1}^{(m, k)}=x_l^{(m, k)}+x_l^{(m, k)}\left(b_1-a_{11} x_l^{(m, k)}-a_{12} y_l^{(m, k)}\right) \delta t+ \\ x_l^{(m, k)} \sigma_2 \xi_l^{(m, k)} \sqrt{\delta t}+\frac{1}{2} x_l^{(m, k)} \sigma_2^2\left(\left(\xi_l^{(m, k)}\right)^2 \delta t-\delta t\right), \\ x_0^{(m, k)}=x_0^{(m)}, \\ y_{l+1}^{(m, k)}=y_l^{(m, k)}+y_l^{(m, k)}\left(b_1-a_{11} x_l^{(m, k)}-a_{12} y_l^{(m, k)}\right) \delta t+ \\ y_l^{(m, k)} \sigma_2 \xi_l^{(m, k)} \sqrt{\delta t}+\frac{1}{2} y_l^{(m, k)} \sigma_2^2\left(\left(\xi_l^{(m, k)}\right)^2 \delta t-\delta t\right), \\ y_0^{(m, k)}=y_0^{(m)}, \end{gathered} (8)

with tiny time step δt=0.001 to system (7), with l=0, …, M-1 \quad\left(M=\frac{h}{\delta t}\right) to simulate the observations \hat{\boldsymbol{y}}_1^{(m, k)} \approx\left(x_M^{(m, k)}, y_M^{(m, k)}\right)^{\mathrm{T}}(m=1, \cdots, \left.N_0, k=1, \cdots, N_1\right). Note that, for analyzing statistics of the observation data we have applied the Euler-Maruyama scheme for simplicity, while for simulating these data, we applied here the Milstein scheme for higher accuracy.

In order to observe the impact of different h on the estimation accuracy, we testify on the set of h: h∈{0.01, 0.02, 0.03, 0.05, 0.1, 0.2, 0.4}. Results are given in Section 3.

Due to errors in the training, the estimated values of the parameters may rely on sampling points. To alleviate this issue, \varOmega_{\mathrm{e}}=\left[\frac{\pi}{2}-\frac{\pi}{2 \sqrt{2}}, \frac{\pi}{2}+\frac{\pi}{2 \sqrt{2}}\right] \times\left[\frac{\pi}{2}-\frac{\pi}{2 \sqrt{2}}, \frac{\pi}{2}+\frac{\pi}{2 \sqrt{2}}\right] as subarea is chosen by us from the region Ωd on which we calculate the averge of the parameters over selected points as the final estimates. The errors between the estimated parameters and the true parameters are measured by

\varepsilon_p=|p-\hat{p}|, (9)

where \hat{p} represents the estimation of the true parameter p.

Note that we take λ1=λ2=1/2 in the loss function (5) in our numerical test.

3 Numerical results

Figure 2 illustrates the training and testing loss against the training epoch, for different h, respectively. It can be seen that, both kinds of loss decrease fast as the training epoch approaches near 200. In addition, they also decrease as h decreases, which may be due to the increasing closeness between the true solution and the numerial Euler-Maruyama scheme, based on which the statistics of the data are derived, for smaller h in (3).

Download:


Fig. 2 Training (solid) and testing (dashed) loss against the training epoch for different h

Table 1 lists the estimated values of the parameters for h=0.01 with training epoch 2 000, where the errors reach the magnitude 10-3.

Table 1 Real and estimated values of the parameters

Figure 3 is the lg-lg drawing of the estimation errors of the parameters against h, where the estimation error is defined by (9). The dashed line is of slope 1, which indicates that the estimation errors of all the parameters are approximately of order 1.

Download:


Fig. 3 Error of the estimated parameters against h for h∈{0.01, 0.02, 0.03, 0.05, 0.01, 0.2, 0.4}

In Fig. 4, we compare the x(t) and y(t) trajectories produced by the learned stochastic L-V equations and those by the true SDE system. We see that they coincide well for both x and y. Hereby we take (x0, y0)=(1.5, 10), t∈[0, 10], and the time step for numerical evolution is h0=0.01.

Download:


Fig. 4 Trajectories produced by the learned SDE and by the true SDE, for x(t) and y(t)
4 Conclusion

We set up a neural network algorithm to learn the unkonwn parameters of a class of stochastic L-V systems. The Euler-Maruyama discretization enables the approximation of the mean and covariance matrix of the observational variable by data statistics, on the basis of which the loss function is established. Numerical tests show the implementation, validate the method, and indicate the possibility of improving learning accuracy by adjusting time step of the numerical scheme. Future research includes learning unknown parameters which are functions of time t, namely bi(t), aij(t), and σi(t) (i, j∈{1, …, n}) for non-autonomous stochastic L-V systems.

References
[1]
Jiang D Q, Ji C Y, Li X Y, et al. Analysis of autonomous Lotka-Volterra competition systems with random perturbation[J]. Journal of Mathematical Analysis and Applications, 2012, 390(2): 582-595. DOI:10.1016/j.jmaa.2011.12.049
[2]
Mao X R, Marion G, Renshaw E. Environmental Brownian noise suppresses explosions in population dynamics[J]. Stochastic Processes and Their Applications, 2002, 97(1): 95-110. DOI:10.1016/S0304-4149(01)00126-0
[3]
Zhou L N, Pan S M, Wang J W, et al. Machine learning on big data: opportunities and challenges[J]. Neurocomputing, 2017, 237: 350-361. DOI:10.1016/j.neucom.2017.01.026
[4]
Raissi M, Perdikaris P, Karniadakis G E. Multistep neural networks for data-driven discovery of nonlinear dynamical systems[EB/OL]. 2018: arXiv: 1801.01236. (2018-01-04)[2023-01-11]. https://arxiv.org/abs/1801.01236.
[5]
Chen R T Q, Rubanova Y, Bettencourt J, et al. Neural ordinary differential equations[EB/OL]. 2018: arXiv: 1806.07366. (2019-12-14)[2023-01-11]. https://arxiv.org/abs/1806.07366.
[6]
Gholami A, Keutzer K, Biros G. ANODE: Unconditionally accurate memory-efficient gradients for neural ODEs[C]//Proceedings of the 28th International Joint Conference on Artificial Intelligence. August 10-16, 2019, Macao, China. New York: ACM, 2019: 730-736. DOI: 10.24963/ijcai.2019/103.
[7]
Urain J, Ginesi M, Tateo D, et al. ImitationFlow: learning deep stable stochastic dynamic systems by normalizing flows[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). October 24-January 24, 2021, Las Vegas, NV, USA. IEEE, 2021: 5231-5237. DOI: 10.1109/IROS45743.2020.9341035.
[8]
Kong L K, Sun J M, Zhang C. SDE-net: equipping deep neural networks with uncertainty estimates[EB/OL]. 2020: arXiv: 2008.10546. (2020-08-24)[2023-01-11]. https://arxiv.org/abs/2008.10546.
[9]
Yildiz C, Heinonen M, Intosalmi J, et al. Learning stochastic differential equations with Gaussian processes without gradient matching[C]//2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP). September 17-20, 2018, Aalborg, Denmark. IEEE, 2018: 1-6. DOI: 10.1109/MLSP.2018.8516991.
[10]
Bento J, Ibrahimi M, Montanari A. Learning networks of stochastic differential equations[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems-Volume 1. December 6-9, 2010, Vancouver, British Columbia, Canada. New York: ACM, 2010: 172-180. DOI: 10.48550/arXiv.1011.0415.
[11]
Tzen B, Raginsky M. Neural stochastic differential equations: deep latent gaussian models in the diffusion Limit[EB/OL]. 2019: arXiv: 1905.09883. (2019-10-28)[2023-01-11]. https://arxiv.org/abs/1905.09883.
[12]
Kloppers P H, Greeff J C. Lotka-Volterra model parameter estimation using experiential data[J]. Applied Mathematics and Computation, 2013, 224: 817-825. DOI:10.1016/j.amc.2013.08.093
[13]
Wu L F, Wang Y N. Estimation the parameters of Lotka-Volterra model based on grey direct modelling method and its application[J]. Expert Systems With Applications, 2011, 38(6): 6412-6416. DOI:10.1016/j.eswa.2010.09.013
[14]
Wu L F, Liu S F, Wang Y N. Grey Lotka-Volterra model and its application[J]. Technological Forecasting and Social Change, 2012, 79(9): 1720-1730. DOI:10.1016/j.techfore.2012.04.020
[15]
Xu L, Liu J Y, Zhang G. Pattern formation and parameter inversion for a discrete Lotka-Volterra cooperative system[J]. Chaos, Solitons & Fractals, 2018, 110: 226-231. DOI:10.1016/j.chaos.2018.03.035
[16]
Xu L, Lou S S, Xu P Q, et al. Feedback control and parameter invasion for a discrete competitive Lotka-Volterra system[J]. Discrete Dynamics in Nature and Society, 2018, 2018: 1-8. DOI:10.1155/2018/7473208
[17]
Shakurov I R, Asadullin R M. Parameter identification for systems of nonlinear differential equations by the example of Lotka-Volterra model[J]. Biophysics, 2014, 59(2): 339-340. DOI:10.1134/S0006350914020225
[18]
Waniewski J, Jedruch W. Individual based modeling and parameter estimation for a Lotka-Volterra system[J]. Mathematical Biosciences, 1999, 157(1/2): 23-36. DOI:10.1016/S0025-5564(98)10075-5
[19]
Luus R. Parameter estimation of Lotka-Volterra problem by direct search optimization[J]. Hungarian Journal of Industrial Chemistry, 1998, 26(4): 287-292.
[20]
Higham D J. An algorithmic introduction to numerical simulation of stochastic differential equations[J]. SIAM Review, 2001, 43(3): 525-546. DOI:10.1137/S0036144500378302
[21]
Feller W. An introduction to probability theory and its applications[M]. New York: Wiley, 1968.
[22]
Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library[EB/OL]. 2019: arXiv: 1912.01703. (2019-12-14)[2023-01-11]. https://arxiv.org/abs/1912.01703.
Learning the parameters of a class of stochastic Lotka-Volterra systems with neural networks
WANG Zhanpeng, WANG Lijin