21世纪以来,随着控制、制导与无人技术的发展,无人化作战方式正在成为战场上的主流,俄乌冲突中涌现出大量的无人机、无人艇等新型装备正在颠覆传统作战方式,特别是以小型无人艇集群自杀式偷袭大型水面舰艇取得了出奇制胜的效果。因此研究如何反制无人艇集群是当前船舶制导与控制领域的热点问题。
自动控制[1]、博弈[2 − 3]、强化学习[4 − 6]作为船舶控制工程领域的几个重要技术分支,是研究无人艇集群智能控制、决策与对抗博弈问题的重要方法。对于自动控制而言,其优点在于实时性,能够针对环境和外界反馈实时做出运动策略。但自动控制的控制对象一般是机器,而无人艇对抗博弈的对象则是智能化的对手,这恰恰是博弈所要解决的问题[7]。强化学习则旨在训练出具有泛化能力的无人艇集群对抗博弈的模型,通过外界环境对无人艇做出奖励或惩罚,以做出相应的策略。与自动控制相比,强化学习的方法能够应对智能化的对手,但缺点是实时性差,训练出具有泛化能力的模型需要大量的时间。通过上述分析不难看出,结合几种方法的优势能够更好地解决无人艇集群对抗博弈的问题。Vamvoudakis等[8]首次将actor-critic框架的强化学习方法引入优化控制问题中,将强化学习方法用于逼近价值函数与解耦哈密顿(Hamiltonian Jacobi Bellman Iss,HJB)函数,并证明了actor-critic强化学习方法在控制系统中的收敛性和稳定性。此后,actor-critic强化学习方法或critic-only强化学习方法常用于优化控制与微分博弈中的函数逼近与解耦问题[9 − 11],通过梯度下降方法使critic网络逼近价值函数,通过收敛性分析使actor网络满足系统稳定性。
本文针对无人艇集群对抗博弈中的追逃博弈问题,提出一种具有博弈思想的制导策略,旨在为无人艇集群提供运动策略。通过建立无人艇集群分布式追逃博弈模型,将追逃博弈问题转化为微分求解的问题。利用极小值-极大值策略得到无人艇最佳运动策略,针对微分博弈价值函数的存在性问题,利用强化学习方法得到HJB函数最小值,从而得到了逼近的追逃博弈解。
1 问题描述 1.1 图 论在无人艇集群追逃博弈中,通常存在3种拓扑图结构,分别是追击无人艇集群内部拓扑图,逃避无人艇集群内部拓扑图和逃避无人艇与追击无人艇集群间拓扑图。
考虑
相似地,由有向网络连接的
由于追逃博弈中,追击无人艇和逃避无人艇都需要观测对方的状态以做出相应的运动策略,因此用无向图表示追击和逃避双方的观测状态。以
定义无人艇的简化运动学模型[12]:
| $ \begin{gathered} \left\{ {\begin{array}{*{20}{c}} {\dot x_i^p = U_i^p\cos \left( {\psi _i^p + \beta _i^p} \right)},\\ {\dot y_i^p = U_i^p\sin \left( {\psi _i^p + \beta _i^p} \right)} ,\end{array}} \right.{\text{ }} \\ \left\{ {\begin{array}{*{20}{c}} {\dot x_j^e = U_j^e\cos \left( {\psi _j^e + \beta _j^e} \right)},\\ {\dot y_j^e = U_j^e\sin \left( {\psi _j^e + \beta _j^e} \right)} 。\end{array}} \right. \\ \end{gathered} $ | (1) |
式中:
定义追逃博弈的运动学误差如下:
| $ \left\{\begin{aligned} &{\boldsymbol{p}}_{ei}^p = \sum\limits_{k = 1}^N {{a_{ik}}\left( {{\boldsymbol{p}}_i^p - {\boldsymbol{p}}_k^p} \right)} + \sum\limits_{j = 1}^M {{c_{ij}}\left( {{\boldsymbol{p}}_i^p - {\boldsymbol{p}}_j^e} \right)},\\ &{\boldsymbol{p}}_{ej}^e = \gamma \sum\limits_{l = 1}^M {{b_{jl}}\left( {{\boldsymbol{p}}_j^e - {\boldsymbol{p}}_l^e} \right)} - \sum\limits_{i = 1}^N {{e_{ji}}\left( {{\boldsymbol{p}}_j^e - {\boldsymbol{p}}_i^p} \right)}。\\ \end{aligned}\right. $ | (2) |
式中:
为方便后续算法设计,定义无人艇集群重心为:
| $ \left\{\begin{aligned} &{\boldsymbol{\bar p}}_i^{pp} = \frac{1}{{d_i^p + d_i^{pe}}}\sum\limits_{k = 1}^N {{a_{ik}}{\boldsymbol{p}}_k^p},\\ &{\boldsymbol{\bar p}}_i^{pe} = \frac{1}{{d_i^p + d_i^{pe}}}\sum\limits_{j = 1}^M {{c_{ij}}{\boldsymbol{p}}_j^e}。\\ \end{aligned} \right.$ | (3) |
式中:
由式(2)和式(3)可以得到追逃博弈运动学误差的导数如下:
| $ \left\{\begin{aligned} &{\boldsymbol{\dot p}}_{ei}^p = \left( {d_i^p + d_i^{pe}} \right)\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}_i^{pe}} \right),\\ &{\boldsymbol{\dot p}}_{ej}^e = \left( {\gamma d_j^e - d_j^{ep}} \right)\left( {{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot{ \bar p}}}_j^{ee} + {\boldsymbol{\dot {\bar p}}}_j^{ep}} \right)。\\ \end{aligned}\right. $ | (4) |
定义追逃博弈中追击无人艇
| $ \left\{\begin{aligned} &{J_{pi}} = \int_0^\infty {\left( {{\boldsymbol{p}}{{_{ei}^p}^{\text{T}}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\dot p}}{{_i^p}^{\text{T}}}{\boldsymbol{\dot p}}_i^p + {\boldsymbol{\dot {\bar p}}}{{_i^{pp}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}{{_i^{pe}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right){\rm{d}}t},\\ &{J_{ej}} = \int_0^\infty {\left( {{\boldsymbol{p}}{{_{ej}^e}^{\text{T}}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\dot p}}{{_j^e}^{\text{T}}}{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot {\bar p}}}{{_j^{ep}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ep} + {\boldsymbol{\dot{\bar p}}}{{_j^{ee}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ee}} \right){\rm{d}}t}。\\ \end{aligned}\right. $ | (5) |
式中:
基于上述分析,追击无人艇
| $ \left\{\begin{aligned} &{{\boldsymbol{\dot p}}_{i^{p^*}}} = \arg \mathop {\min }\limits_{{{\boldsymbol{\dot p}}}_i^p{\boldsymbol{\dot {\bar p}}_i^{pp}}} \mathop {\max }\limits_{{\boldsymbol{\dot {\bar p}}}_i^{pe}} {J_{pi}},\\ &{{\boldsymbol{\dot p}}_{j^{e^*}}} = \arg \mathop {\min }\limits_{{{\boldsymbol{\dot p}}}_j^e{\boldsymbol{\dot{\bar p}}_j^{ee}}} \mathop {\max }\limits_{{\boldsymbol{\dot {\bar p}}}_j^{ep}} {J_{ei}}。\\ \end{aligned}\right. $ | (6) |
式中:
根据式(4)和式(5),追击无人艇
| $ {\left\{\begin{aligned} {H_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) =\ & {{\boldsymbol{p}}_{ei}^{p^{\text{T}}}}{\boldsymbol{p}}_{ei}^p + {{\boldsymbol{\dot p}}_i^{p^{\text{T}}}}{\boldsymbol{\dot p}}_i^p + {{\boldsymbol{\dot {\bar p}}}_i^{{pp}^{\text{T}}}}{\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}_i^{{pe}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pe} +\\ &\nabla {\boldsymbol{V}}_{pi}^{\text{T}}\left( {d_i^p + d_i^{pe}} \right)\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}_i^{pe}} \right),\\ {H_{ej}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) =\ &{\boldsymbol{p}}_{ej}^{e^{\text{T}}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\dot p}}_j^{e^{\text{T}}}{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot {\bar p}}}_j^{{ep}^{\text{T}}}{\boldsymbol{\dot{ \bar p}}}_j^{ep} + {\boldsymbol{\dot {\bar p}}}_j^{{ee}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ee} +\\ &\nabla {\boldsymbol{V}}_{ej}^{\text{T}}\left( {\gamma d_j^e - d_j^{ep}} \right)\left( {{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot{ \bar p}}}_j^{ee} + {\boldsymbol{\dot{\bar p}}}_j^{ep}} \right) 。\end{aligned}\right. }$ | (7) |
式中:
为了最小化HJI函数
| $ \left\{\begin{aligned} &\frac{{\partial {H_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right)}}{{\partial {\boldsymbol{\dot p}}_i^p}} = 0 \Rightarrow {\boldsymbol{\dot p}}_i^{p^*} = - \frac{1}{2}\left( {d_i^p + d_i^{pe}} \right)\nabla {{\boldsymbol{V}}_{pi}},\\ &\frac{{\partial {H_{ej}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right)}}{{\partial {\boldsymbol{\dot p}}_j^e}} = 0 \Rightarrow {\boldsymbol{\dot p}}_j^{e^*} = - \frac{1}{2}\left( {\gamma d_j^e - d_j^{ep}} \right)\nabla {{\boldsymbol{V}}_{ej}}。\\ \end{aligned}\right. $ | (8) |
式中:
通过给出价值函数,即可直接求解无人艇的最优运动策略,但无人艇追逃博弈是一个动态过程,不易直接找到对应的价值函数形式。因此利用actor-critic神经网络强化学习的方式对价值函数进行在线逼近,得到如下所示价值函数负梯度:
| $ \left\{\begin{aligned} &\nabla {{\boldsymbol{V}}_{pi}} = \frac{2}{{{{\left( {d_i^p + d_i^{pe}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{pi}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{W}}_{pi}^{\text{T}}{{\boldsymbol{S}}_{pi}} + {{\boldsymbol{\varepsilon }}_{pi}}} \right) ,\\ &\nabla {{\boldsymbol{V}}_{ej}} = \frac{2}{{{{\left( {\gamma d_j^e - d_j^{ep}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{ej}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{W}}_{ej}^{\text{T}}{{\boldsymbol{S}}_{ej}} + {{\boldsymbol{\varepsilon }}_{ej}}} \right) 。\\ \end{aligned}\right. $ | (9) |
式中:
| $\left\{ \begin{aligned} &\nabla {{{\boldsymbol{\hat V}}}_{pi}} = \frac{2}{{{{\left( {d_i^p + d_i^{pe}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{pi}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\hat W}}_{pci}^{\text{T}}{{\boldsymbol{S}}_{pi}}} \right),\\ &\nabla {{{\boldsymbol{\hat V}}}_{ej}} = \frac{2}{{{{\left( {\gamma d_j^e - d_j^{ep}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{ej}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\hat W}}_{ecj}^{\text{T}}{{\boldsymbol{S}}_{ej}}} \right)。\\ \end{aligned} \right.$ | (10) |
| $ \left\{\begin{aligned} &{\boldsymbol{\dot {\hat p}}}{_i^{p^*}} = - \frac{1}{{d_i^p + d_i^{pe}}}\left( {{{\boldsymbol{K}}_{pi}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\hat W}}_{pai}^{\text{T}}{{\boldsymbol{S}}_{pi}}} \right),\\ &{\boldsymbol{\dot{ \hat p}}}{_j^{e^*}} = - \frac{1}{{\gamma d_j^e - d_j^{ep}}}\left( {{{\boldsymbol{K}}_{ej}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\hat W}}_{eaj}^{\text{T}}{{\boldsymbol{S}}_{ej}}} \right)。\\ \end{aligned}\right. $ | (11) |
式中:
为最小化HJI函数,神经网络权重估计值将按照如下学习律在线更新:
| $ {\left\{\begin{aligned} &{{{\boldsymbol{\dot {\hat W}}}}_{hci}} = - {k_{ci}}\left( {{{\boldsymbol{S}}_{hi}}{\boldsymbol{S}}_{hi}^{\text{T}} + {\sigma _{hi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{hci}},\\ &{{{\boldsymbol{\dot {\hat W}}}}_{hai}} = - \left( {{{\boldsymbol{S}}_{hi}}{\boldsymbol{S}}_{hi}^{\text{T}} + {\sigma _{hi}}{\boldsymbol{I}}} \right) \times \left[ {{k_{ai}}\left( {{{{\boldsymbol{\hat W}}}_{hai}} - {{{\boldsymbol{\hat W}}}_{hci}}} \right) + {k_{ci}}{{{\boldsymbol{\hat W}}}_{hci}}} \right]。\end{aligned}\right. }$ | (12) |
式中:
至此,无人艇集群追逃博弈制导算法设计结束,为方便理解所提算法的执行流程和基本原理,图1为相关的信号流程框图。可知,无人艇集群(1)是通过1.1中的通信或观测拓扑连接起来的,由此得到追逃博弈运动学误差式(2)和式(4),基于此,设计追逃博弈性能指标式(5),充分展现追逃博弈中无人艇的物理意义,根据式(5)引出的HJI方程(7)和价值函数可以获得最优运动策略式(8),为了确定式(8)的取值,利用actor-critic神经网络对价值函数的负梯度式(9)进行在线逼近,得到最优运动策略的估计值式(11)作为无人艇(1)的运动输入信号,算法的信号流程实现闭环。
|
图 1 无人艇集群追逃博弈算法信号流程框图 Fig. 1 Signals flowchart of the USVs pursuit-evasion game |
基于上述设计,无人艇集群(1)将通过运动策略式(11)和在线学习律式(12)完成追逃博弈过程。为验证所提算法的理论可行性,将对控制系统的收敛性进行分析。
定理1 针对无人艇集群(1),分布式运动策略式(11),在线学习律式(12)组成的闭环控制系统,所有误差信号均收敛到一个紧集,并保证半全局一致最终有界。
证 选取如下李雅普诺夫方程:
| $ \begin{split} L =& \frac{1}{2}{\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{p}}_{ei}^p + \frac{1}{2}{\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{p}}_{ej}^e + \frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pci}^{\text{T}}{\boldsymbol{\tilde W}}_{pci}^{}} \right) +\\ &\frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pai}^{\text{T}}{\boldsymbol{\tilde W}}_{pai}^{}} \right) + \frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{ecj}^{\text{T}}{\boldsymbol{\tilde W}}_{ecj}^{}} \right) +\\ &\frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{eaj}^{\text{T}}{\boldsymbol{\tilde W}}_{eaj}^{}} \right) ,\end{split} $ | (13) |
其导数为:
| $ \begin{split} \dot L =& {\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{\dot p}}_{ei}^p + {\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{\dot p}}_{ej}^e + {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pci}^{\text{T}}{\boldsymbol{\dot{ \hat W}}}_{pci}^{}} \right) +{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pai}^{\text{T}}{\boldsymbol{\dot{ \hat W}}}_{pai}^{}} \right) +\\ & {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{ecj}^{\text{T}}{\boldsymbol{\dot {\hat W}}}_{ecj}^{}} \right) + {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{eaj}^{\text{T}}{\boldsymbol{\dot {\hat W}}}_{eaj}^{}} \right) 。\end{split} $ | (14) |
将式(11)代入式(4)中,可得:
| $ \begin{split} {\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{\dot p}}_{ei}^p \leqslant & - \left[ {\lambda _{\min }}\left( {{{\boldsymbol{K}}_{pi}}} \right) - 3 - \sum\limits_{k = 1}^N {\frac{{{a_{ik}}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pk}}} \right)}}{{d_k^p + d_k^{pe}}}} -\right. \\ & \left. \sum\limits_{j = 1}^M {\frac{{{c_{ij}}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pj}}} \right)}}{{\gamma d_j^e + d_j^{ep}}}} \right]{\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{p}}_{ei}^p + \sum\limits_{j = 1}^M {\frac{{{c_{ij}}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pj}}} \right)}}{{\gamma d_j^e + d_j^{ep}}}} \times\\ &{\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{p}}_{ej}^e + \frac{1}{4}{\boldsymbol{\hat W}}_{pai}^{\text{T}}{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}}{\boldsymbol{\hat W}}_{pai}^{} + \sum\limits_{k = 1}^N {\frac{{a_{ik}^{}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pk}}} \right)}}{{4\left( {d_k^p + d_k^{pe}} \right)}}} \times\\ &{\boldsymbol{p}}_{ek}^{{p^{\text{T}}}}{\boldsymbol{p}}_{ek}^p + \sum\limits_{j = 1}^M {\frac{{c_{ij}^2{\boldsymbol{\hat W}}_{paj}^{\text{T}}{{\boldsymbol{S}}_{pj}}{\boldsymbol{S}}_{pj}^{\text{T}}{\boldsymbol{\hat W}}_{paj}^{}}}{{4{{\left( {\gamma d_j^e + d_j^{ep}} \right)}^2}}}} + \\ &\sum\limits_{k = 1}^N {\frac{{a_{ik}^2{\boldsymbol{\hat W}}_{pak}^{\text{T}}{{\boldsymbol{S}}_{pk}}{\boldsymbol{S}}_{pk}^{\text{T}}{\boldsymbol{\hat W}}_{pak}^{}}}{{4{{\left( {d_k^p + d_k^{pe}} \right)}^2}}}}。\\[-5pt] \end{split} $ | (15) |
根据式(12),可以得到如下等式:
| $ \begin{split} {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pci}^{\text{T}}{{{\boldsymbol{\dot {\hat W}}}}_{pci}}} \right) =& \sum\limits_{\iota = x,y} {\left\{ { - \frac{{{k_{ci}}}}{2}{\boldsymbol{\tilde W}}_{\iota ci}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\tilde W}}}_{\iota ci}}} \right.} -\\ &\frac{{{k_{ci}}}}{2}{\boldsymbol{\hat W}}_{\iota ci}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota ci}} -\\ &\left. { \frac{{{k_{ci}}}}{2}{\boldsymbol{W}}_{\iota i}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{\boldsymbol{W}}_{\iota i}}} \right\} ,\end{split} $ | (16) |
| $ \begin{split} {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{{pai}^{\text{T}}}{{{\boldsymbol{\dot {\hat W}}}}_{pai}}} \right) = & \sum\limits_{\iota = x,y} {\left\{ { - \frac{{{k_{ci}}}}{2}{\boldsymbol{\tilde W}}_{\iota ai}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\tilde W}}}_{\iota ai}}} \right.} -\\ & \frac{{{k_{ai}}}}{2}{\boldsymbol{\hat W}}_{\iota ai}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota ai}} -\\ & \frac{{{k_{ai}}}}{2}{\boldsymbol{W}}_{\iota i}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota i}}- \\ & \left. { \frac{{{k_{ai}} - {k_{ci}}}}{2}{\boldsymbol{\hat W}}_{\iota ci}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota ci}}} \right\} 。\end{split} $ | (17) |
同理可以为
定理2 针对在线学习律式(12),其总能使HJB函数的估计值
证 将价值函数估计值式(10)与运动策略估计值式(11)代入HJI函数式(7),得到
| $ \frac{{\partial {H_{pi}}\left( {{\boldsymbol{\dot{ \hat p}}}_i^p,{\boldsymbol{p}}_{ei}^p} \right)}}{{\partial {{{\boldsymbol{\hat W}}}_{pai}}}} = \frac{{2{\boldsymbol{S}}_{pi}^{}{\boldsymbol{S}}_{pi}^{\text{T}}\left( {{{{\boldsymbol{\hat W}}}_{pai}} - {{{\boldsymbol{\hat W}}}_{pci}}} \right)}}{{{{\left( {d_k^p + d_k^{pe}} \right)}^2}}}。$ | (18) |
定义
| $ {\dot Q_{pi}} = - {k_{ai}}\frac{{\partial {Q_{pi}}}}{{\partial {\boldsymbol{\hat W}}_{pai}^{\text{T}}}}\left( {{{\boldsymbol{S}}_i}{\boldsymbol{S}}_i^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right)\frac{{\partial {Q_{pi}}}}{{\partial {{{\boldsymbol{\hat W}}}_{pai}}}} \leqslant 0。$ | (19) |
式中:当时间趋于无穷时,
定理3 在定理1和定理2的基础上,针追逃博弈性能指标式(5),分布式运动策略式(11)总能保证其满足纳什均衡,即总是满足
证 考虑
| $ {J}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right) ={\displaystyle {\int }_{0}^{\infty }{H}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right)}{\rm{d}}t+{V}_{pi}\left({p}_{ei}^{p}\left(0\right)\right)。$ | (20) |
其中,
| $ \begin{split}{H}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right)=&{H}_{pi}\left({\dot{p}}_{i}^{p*},{\dot{\overline{p}}}_{i}^{pe*}\right)+2{\dot{p}}_{i}^{p*}\left({\dot{p}}_{i}^{p}-{\dot{p}}_{i}^{p*}\right) +\\ &{\left({\dot{p}}_{i}^{p} - {\dot{p}}_{i}^{p*}\right)}^{\text{T}} \left({\dot{p}}_{i}^{p} - {\dot{p}}_{i}^{p*}\right) - {\left({\dot{\overline{p}}}_{i}^{pe} - {\dot{\overline{p}}}_{i}^{pe*}\right)}^{\text{T}} \times \\ &\left({\dot{\overline{p}}}_{i}^{pe}-{\dot{\overline{p}}}_{i}^{pe*}\right)-2{\dot{\overline{p}}}_{i}^{pe*}\left({\dot{\overline{p}}}_{i}^{pe}-{\dot{\overline{p}}}_{i}^{pe*}\right)-\\ &\nabla {V}_{pi}^{\text{T}}\left({d}_{i}+{d}_{i}^{pe}\right) \times\left({\dot{p}}_{i}^{p*}-{\dot{p}}_{i}^{p}\right)+\\ &\nabla {V}_{pi}^{\text{T}}\left({d}_{i}+{d}_{i}^{pe}\right)\left({\dot{\overline{p}}}_{i}^{pe*}-{\dot{\overline{p}}}_{i}^{pe}\right),\end{split} $ | (21) |
进一步,将式(21)代入式(20),可得:
| ${ \begin{split} {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = & \int_0^\infty \Big[{{{\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)}^{\text{T}}}\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)} +2{\boldsymbol{\dot p}}_i^{p*}\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right) -\\ & 2{\boldsymbol{\dot {\bar p}}}_i^{pe*} \times \left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot{ \bar p}}}_i^{pe*}} \right) - {\left( {{\boldsymbol{\dot{ \bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right)^{\text{T}}} \times\\ &\left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right) - \nabla {\boldsymbol{V}}_{pi}^{\text{T}}\left( {{d_i} + d_i^{pe}} \right) \times \left( {{\boldsymbol{\dot p}}_i^{p*} - {\boldsymbol{\dot p}}_i^p} \right) +\\ & \nabla {\boldsymbol{V}}_{pi}^{\text{T}}\left( {{d_i} + d_i^{pe}} \right) \times \left( {{\boldsymbol{\dot{ \bar p}}}_i^{pe*} - {\boldsymbol{\dot {\bar p}}}_i^{pe}} \right)\Big]{\rm{d}}t + {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right) 。\\[-12pt] \end{split} }$ | (22) |
当
| $ {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right),$ | (23) |
当
| $ \begin{split} {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = &- \int_0^\infty {{{\left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right)}^{\text{T}}}\left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right)} {\rm{d}}t +\\ &{{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right) 。\end{split} $ | (24) |
当
| $ {{J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = \int_0^\infty {{{\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)}^{\text{T}}} \left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)}{\rm{d}}t + {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right) 。} $ | (25) |
根据式(23)~式(25)可以证得
本仿真验证实验中考虑3艘追击无人艇和3艘逃避无人艇,其拓扑图结构如图2所示。无人艇的初始位置分别为
|
图 2 无人艇集群拓扑图结构 Fig. 2 The topological graph of multiple USVs |
|
图 3
|
|
图 4
|
图3(a)和图4(a)为无人艇集群追逃博弈轨迹结果,可以看出,越小的
本文研究了无人艇集群的追逃博弈问题。通过图论分别对逃避无人艇集群和追击无人艇集群建立了逃逸运动学模型和追击运动学模型,基于追逃运动模型将追逃博弈问题转化为微分博弈求解问题。求解微分博弈问题中运用了极小值-极大值策略和强化学习技术,实时得到无人艇最佳运动策略。通过仿真实验验证了追击无人艇能够有效对逃避无人艇进行堵截,证明了本文所提追逃博弈制导算法的有效性。
在实际应用中,该制导算法的基础技术是多船制导信号协同,关键技术是对敌对无人艇的探测识别。首先通过激光雷达、视觉识别等传感器件探测并锁定敌对无人艇,将敌对无人艇的位置、速度信息传递给船载工控主机,由船载工控主机实时计算己方无人艇的运动策略,再由无人艇底层控制算法为己方无人艇提供相应的动力。考虑到每条无人艇的船载工控主机都能实时计算整个己方无人艇集群的运动策略,因此己方无人艇集群不再需要传统意义上的通信连接也能够保持协同,避免了由于船间距过大而引起的通信连接中断问题。
| [1] |
NING X, ZHANG H T, ZHU L J. Prescribed-time collective evader-capturing for autonomous surface vehicles[J]. Automatica, 2024, 167: 111761. DOI:10.1016/j.automatica.2024.111761 |
| [2] |
LOPEZ V, LEWIS F, WAN Y, et al. Solutions for multiagent pursuit-evasion games on communication graphs: finite-time capture and asymptotic behaviors[J]. IEEE Transactions on Automatic Control, 2020, 65(5): 1911-1923. DOI:10.1109/TAC.2019.2926554 |
| [3] |
庞樨, 杨神化, 陈国权, 等. 基于扩展式博弈的多船协商避碰研究[J]. 舰船科学技术, 2025, 47(1): 76-82. PANG X, YANG S H, CHEN G Q, et al. Research on multi-ship negotiation collision avoidance based on extensive game model[J]. Ship Science and Technology, 2025, 47(1): 76-82. |
| [4] |
于长东, 刘新阳, 陈聪, 等. 基于多智能体深度强化学习的无人艇集群博弈对抗研究[J]. 水下无人系统学报, 2024, 32(1): 79-86. DOI:10.11993/j.issn.2096-3920.2023-0159 |
| [5] |
刘鹏, 赵建新, 张宏映, 等. 基于改进型MADDPG的多智能体对抗策略算法[J]. 火力与指挥控制, 2023, 48(3): 132-138,145. DOI:10.3969/j.issn.1002-0640.2023.03.020 |
| [6] |
HUA X, LIU J X, ZHANG J J, et al. An apollonius circle based game theory and Q-learning for cooperative hunting in unmanned aerial vehicle cluster[J]. Computers and Electrical Engineering, 2023, 110: 108876. DOI:10.1016/j.compeleceng.2023.108876 |
| [7] |
程代展, 付世华. 博弈控制论简述[J]. 控制理论与应用, 2018, 35(5): 588-592. DOI:10.7641/CTA.2017.60952 |
| [8] |
VAMVOUDAKIS K, LEWIS F. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J]. Automatica, 2010, 64: 878-888. |
| [9] |
LONG J, YU D X, WEN G X, et al. Game-based backstepping design for strict-feedback nonlinear multi-agent systems based on reinforcement learning[J]. IEEE Transactions on Neural Networks And Learning Systems, 2024, 35(1): 817-830. DOI:10.1109/TNNLS.2022.3177461 |
| [10] |
VAMVOUDAKIS K, LEWIS F. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations[J]. Automatica, 2011, 47: 1556-1569. DOI:10.1016/j.automatica.2011.03.005 |
| [11] |
MAZOUCHI M, NAGHIBI-SISTANI M B, SANI S K H. Novel distributed optimal adaptive control algorithm for nonlinear multi-agent differential graphical games[J]. IEEE/CAA Journal of automatica sinica, 2018, 5(1): 331-341. DOI:10.1109/JAS.2017.7510784 |
| [12] |
初庆栋, 尹羿博, 龚小旋, 等. 基于双偶极向量场的欠驱动无人船目标跟踪制导方法[J]. 中国舰船研究, 2022, 17(4): 32-37. |
| [13] |
WEN G X, CHEN C. L. P. , GE S. S. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions [J]. IEEE Transactions on Cybernetics, 2021, 51(9): 4567−4580.
|
2025, Vol. 47

