舰船科学技术  2025, Vol. 47 Issue (18): 54-59    DOI: 10.3404/j.issn.1672-7649.2025.18.010   PDF    
基于微分博弈的无人艇作战与反无人艇对抗研究
尹世麟, 张国庆, 李纪强, 黄晨峰     
大连海事大学 航海学院,辽宁 大连 116026
摘要: 无人作战系统已成为未来乃至当前战场中不可或缺的一部分,如何反制无人作战系统成为了热点研究问题。基于未来现代化海战中对无人作战艇的反制这一背景,提出基于分布式微分博弈方法的无人艇追击-逃避对策的制导原理。通过对无人艇追击-逃避运动关系建模,构建合适的博弈指标,将追击-逃避博弈问题转化为微分博弈求解问题。利用3条追击无人艇和3条逃避无人艇对所提制导原理进行仿真实验验证,实验结果表明追击无人艇能够对逃避无人艇进行协同堵截。该研究为未来海战中无人艇集群自主决策提供了理论参考价值。
关键词: 微分博弈     追逃博弈     无人艇集群     强化学习    
Research on Anti-USVs' combat threat based on differential game
YIN Shilin, ZHANG Guoqing, LI Jiqiang, HUANG Chenfeng     
Navigation College, Dalian Maritime University, Dalian 116026, China
Abstract: Since unmanned combat system has been an indispensable part of the future and even the present battlefield. How to countering unmanned combat system is a hot research problem. On the basis of countering unmanned war ship in the future of modern naval warfare, a distributed differential game-based unmanned surface vehicles (USVs) pursuit-evasion game guidance principle is proposed. By modeling the kinematics relations of USVs' pursuit-evasion and setting suitable game perform indices, the problem of pursuit-evasion game is transformed into the differential game solving problem. A simulation case is carried out to verify the proposed guidance principle, where three pursuit USVs and three evasion USVs are considered. The simulation results show that the pursuit USVs can achieve the cooperative interception to evasion USVs. The study of this paper may provide theoretical reference value to USVs’ decision-making in future sea battles.
Key words: differential game     pursuit-evasion game     multiple unmanned surface vehicle     reinforcement learning    
0 引 言

21世纪以来,随着控制、制导与无人技术的发展,无人化作战方式正在成为战场上的主流,俄乌冲突中涌现出大量的无人机、无人艇等新型装备正在颠覆传统作战方式,特别是以小型无人艇集群自杀式偷袭大型水面舰艇取得了出奇制胜的效果。因此研究如何反制无人艇集群是当前船舶制导与控制领域的热点问题。

自动控制[1]、博弈[23]、强化学习[46]作为船舶控制工程领域的几个重要技术分支,是研究无人艇集群智能控制、决策与对抗博弈问题的重要方法。对于自动控制而言,其优点在于实时性,能够针对环境和外界反馈实时做出运动策略。但自动控制的控制对象一般是机器,而无人艇对抗博弈的对象则是智能化的对手,这恰恰是博弈所要解决的问题[7]。强化学习则旨在训练出具有泛化能力的无人艇集群对抗博弈的模型,通过外界环境对无人艇做出奖励或惩罚,以做出相应的策略。与自动控制相比,强化学习的方法能够应对智能化的对手,但缺点是实时性差,训练出具有泛化能力的模型需要大量的时间。通过上述分析不难看出,结合几种方法的优势能够更好地解决无人艇集群对抗博弈的问题。Vamvoudakis等[8]首次将actor-critic框架的强化学习方法引入优化控制问题中,将强化学习方法用于逼近价值函数与解耦哈密顿(Hamiltonian Jacobi Bellman Iss,HJB)函数,并证明了actor-critic强化学习方法在控制系统中的收敛性和稳定性。此后,actor-critic强化学习方法或critic-only强化学习方法常用于优化控制与微分博弈中的函数逼近与解耦问题[911],通过梯度下降方法使critic网络逼近价值函数,通过收敛性分析使actor网络满足系统稳定性。

本文针对无人艇集群对抗博弈中的追逃博弈问题,提出一种具有博弈思想的制导策略,旨在为无人艇集群提供运动策略。通过建立无人艇集群分布式追逃博弈模型,将追逃博弈问题转化为微分求解的问题。利用极小值-极大值策略得到无人艇最佳运动策略,针对微分博弈价值函数的存在性问题,利用强化学习方法得到HJB函数最小值,从而得到了逼近的追逃博弈解。

1 问题描述 1.1 图 论

在无人艇集群追逃博弈中,通常存在3种拓扑图结构,分别是追击无人艇集群内部拓扑图,逃避无人艇集群内部拓扑图和逃避无人艇与追击无人艇集群间拓扑图。

考虑$N$个由有向无线网络相连追击无人艇,他们之间的通信可以用$ {a_{ik}} $表示,其中${a_{ik}} = 1(k \ne i)$代表追击无人艇$i$与追击无人艇$k$之间存在通信,$ {a_{ik}} = 0 $则意味着2艘追击无人艇之间不存在通信。值得注意的是,在有向网络中信息只能单向传递,即$ {a_{ik}} = 1 \Rightarrow {a_{ki}} = 0 $,且$ {a_{ii}} = 0 $。为方便后续公式推导,定义$ d_i^p = \displaystyle\sum\limits_{k = 1}^N {{a_{ik}}} $

相似地,由有向网络连接的$M$个逃避无人艇可用$ {a_{jl}} $表示其通信状态。${b_{jl}} = 1(j \ne l)$表示逃避无人艇$j$与逃避无人艇$l$之间存在通信,$ {b_{jl}} = 0 $则相反。定义$ d_j^e = \displaystyle\sum\limits_{l = 1}^M {{b_{jl}}} $

由于追逃博弈中,追击无人艇和逃避无人艇都需要观测对方的状态以做出相应的运动策略,因此用无向图表示追击和逃避双方的观测状态。以${c_{ij}}$${e_{ij}}$表示追击无人艇$i$与逃避无人艇$j$之间的观测状态,若${c_{ij}} = 1$说明追击无人艇$i$能观测到逃避无人艇$j$${c_{ij}} = 0$则相反。${e_{ij}} = 1$表示逃避无人艇$j$能观测到追击无人艇$i$${e_{ij}} = 0$则相反。定义$ d_i^{pe} = \displaystyle\sum\limits_{j = 1}^M {{c_{ij}}} $$ d_i^{ep} = \displaystyle\sum\limits_{i = 1}^N {{e_{ji}}} $

1.2 追逃博弈运动学建模

定义无人艇的简化运动学模型[12]

$ \begin{gathered} \left\{ {\begin{array}{*{20}{c}} {\dot x_i^p = U_i^p\cos \left( {\psi _i^p + \beta _i^p} \right)},\\ {\dot y_i^p = U_i^p\sin \left( {\psi _i^p + \beta _i^p} \right)} ,\end{array}} \right.{\text{ }} \\ \left\{ {\begin{array}{*{20}{c}} {\dot x_j^e = U_j^e\cos \left( {\psi _j^e + \beta _j^e} \right)},\\ {\dot y_j^e = U_j^e\sin \left( {\psi _j^e + \beta _j^e} \right)} 。\end{array}} \right. \\ \end{gathered} $ (1)

式中:$ {\boldsymbol{p}}_i^p = \left[ {x_i^p;y_i^p} \right] $$ {\boldsymbol{p}}_j^e = \left[ {x_j^e;y_j^e} \right] $分别为追击无人艇$i$与逃避无人艇$j$的位置坐标信息;$ U_i^p $$ U_j^e $分别为追击无人艇$i$与逃避无人艇$j$的实际速度;$ \psi _i^p $$ \psi _j^e $分别为追击无人艇$i$与逃避无人艇$j$的艏向角;$ \beta _i^p $$ \beta _j^e $分别为追击无人艇$i$与逃避无人艇$j$的横漂角。进而,追击无人艇$i$与逃避无人艇$j$的实际航向可分别表达为$ \psi _i^{} = \psi _i^p + \beta _i^p $$ \psi _j^{} = \psi _j^e + \beta _j^e $,同时它们可由$ \psi _i^{} = {\text{atan}}2\left( {\dot y_i^p,\dot x_i^p} \right) $$ \psi _j^{} = {\text{atan}}2\left( {\dot y_j^e,\dot x_j^e} \right) $计算得到。考虑到无人艇物理约束限制,其坐标最大变化率为$5\;{{\text{m}}/ {\text{s}}}$,即$\dot x_i^p \leqslant 5\;{{\text{m}} / {\text{s}}},\dot y_i^p \leqslant 5\;{{\text{m}}/ {\text{s}}},\dot x_j^e \leqslant 5\; {{\text{m}}/ {\text{s}}},\dot y_j^e \leqslant 5\;{{\text{m}}/{\text{s}}}$

定义追逃博弈的运动学误差如下:

$ \left\{\begin{aligned} &{\boldsymbol{p}}_{ei}^p = \sum\limits_{k = 1}^N {{a_{ik}}\left( {{\boldsymbol{p}}_i^p - {\boldsymbol{p}}_k^p} \right)} + \sum\limits_{j = 1}^M {{c_{ij}}\left( {{\boldsymbol{p}}_i^p - {\boldsymbol{p}}_j^e} \right)},\\ &{\boldsymbol{p}}_{ej}^e = \gamma \sum\limits_{l = 1}^M {{b_{jl}}\left( {{\boldsymbol{p}}_j^e - {\boldsymbol{p}}_l^e} \right)} - \sum\limits_{i = 1}^N {{e_{ji}}\left( {{\boldsymbol{p}}_j^e - {\boldsymbol{p}}_i^p} \right)}。\\ \end{aligned}\right. $ (2)

式中:$ {\boldsymbol{p}}_{ei}^p $$ {\boldsymbol{p}}_{ej}^e $分别为追击无人艇$i$和逃避无人艇$j$的运动学误差;$ {\boldsymbol{p}}_k^p $$ {\boldsymbol{p}}_l^e $分别为追击无人艇$k$和逃避无人艇$l$的位置坐标信息;$ \gamma > 0 $为逃避无人艇集群的协同参数,大$ \gamma $值表示逃避无人艇集群的最高优先级为保持队形紧密,小$ \gamma $值表示逃避无人艇摆脱追击无人艇堵截的优先级高于保持队形。对于追击无人艇$i$和逃避无人艇$j$而言,其控制目标均是使运动学误差收敛至$0$,因此运动学误差的物理意义是追击无人艇$i$既要与其拓扑相邻的无人艇$k$保持紧密队形,又要追击上逃避无人艇,逃避无人艇$j$则要在与其拓扑相邻的无人艇$l$保持队形的基础上,尽量与追击无人艇$i$保持较大的距离,摆脱其追击。

为方便后续算法设计,定义无人艇集群重心为:

$ \left\{\begin{aligned} &{\boldsymbol{\bar p}}_i^{pp} = \frac{1}{{d_i^p + d_i^{pe}}}\sum\limits_{k = 1}^N {{a_{ik}}{\boldsymbol{p}}_k^p},\\ &{\boldsymbol{\bar p}}_i^{pe} = \frac{1}{{d_i^p + d_i^{pe}}}\sum\limits_{j = 1}^M {{c_{ij}}{\boldsymbol{p}}_j^e}。\\ \end{aligned} \right.$ (3)

式中:$ {\boldsymbol{\bar p}}_i^p $$ {\boldsymbol{\bar p}}_j^e $分别为追击无人艇$i$的追击重心和逃避重心。同理,可以定义逃避无人艇$j$的逃避重心$ {\boldsymbol{\bar p}}_j^{ee} = {{\displaystyle\sum\limits_{l = 1}^M {{b_{jl}}{\boldsymbol{p}}_l^e} }/ {\left( {\gamma d_j^e - d_j^{ep}} \right)}} $和追击重心$ {\boldsymbol{\bar p}}_j^{ep} = {\displaystyle\sum\limits_{i = 1}^N {{e_{ji}}{\boldsymbol{p}}_i^p} }/ \left( {\gamma d_j^e - d_j^{ep}} \right) $

由式(2)和式(3)可以得到追逃博弈运动学误差的导数如下:

$ \left\{\begin{aligned} &{\boldsymbol{\dot p}}_{ei}^p = \left( {d_i^p + d_i^{pe}} \right)\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}_i^{pe}} \right),\\ &{\boldsymbol{\dot p}}_{ej}^e = \left( {\gamma d_j^e - d_j^{ep}} \right)\left( {{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot{ \bar p}}}_j^{ee} + {\boldsymbol{\dot {\bar p}}}_j^{ep}} \right)。\\ \end{aligned}\right. $ (4)
2 微分博弈求解 2.1 极小值-极大值策略

定义追逃博弈中追击无人艇$i$与逃避无人艇$j$的性能指标如下:

$ \left\{\begin{aligned} &{J_{pi}} = \int_0^\infty {\left( {{\boldsymbol{p}}{{_{ei}^p}^{\text{T}}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\dot p}}{{_i^p}^{\text{T}}}{\boldsymbol{\dot p}}_i^p + {\boldsymbol{\dot {\bar p}}}{{_i^{pp}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}{{_i^{pe}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right){\rm{d}}t},\\ &{J_{ej}} = \int_0^\infty {\left( {{\boldsymbol{p}}{{_{ej}^e}^{\text{T}}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\dot p}}{{_j^e}^{\text{T}}}{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot {\bar p}}}{{_j^{ep}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ep} + {\boldsymbol{\dot{\bar p}}}{{_j^{ee}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ee}} \right){\rm{d}}t}。\\ \end{aligned}\right. $ (5)

式中:$ {J_{pi}} $$ {J_{ei}} $分别为追击无人艇$i$与逃避无人艇$j$的性能指标,且二者均希望取得最小的性能指标。这对于追击无人艇$i$意味着3个优化目标:1)最小化与逃避无人艇$ j $和其拓扑相邻的追击无人艇之间的距离;2)最小化自己和拓扑相邻的追击无人艇的运动策略,降低追击难度;3)最大化逃避无人艇$j$的运动策略,使其需要最快的速度摆脱追击。对于逃避无人艇$j$则意味着3个优化目标:1)最小化与其拓扑相邻的逃避无人艇之间的距离并最大化与追击无人艇$i$之间的距离;2)最小化自己和拓扑相邻的逃避无人艇的运动策略,降低逃逸难度;3)最大化追击无人艇$i$的运动策略,使其需要最快的追击速度。

基于上述分析,追击无人艇$i$与逃避无人艇$j$的最优运动策略可表达为:

$ \left\{\begin{aligned} &{{\boldsymbol{\dot p}}_{i^{p^*}}} = \arg \mathop {\min }\limits_{{{\boldsymbol{\dot p}}}_i^p{\boldsymbol{\dot {\bar p}}_i^{pp}}} \mathop {\max }\limits_{{\boldsymbol{\dot {\bar p}}}_i^{pe}} {J_{pi}},\\ &{{\boldsymbol{\dot p}}_{j^{e^*}}} = \arg \mathop {\min }\limits_{{{\boldsymbol{\dot p}}}_j^e{\boldsymbol{\dot{\bar p}}_j^{ee}}} \mathop {\max }\limits_{{\boldsymbol{\dot {\bar p}}}_j^{ep}} {J_{ei}}。\\ \end{aligned}\right. $ (6)

式中:$ {\boldsymbol{\dot p}}{_i^{p^*}} $$ {\boldsymbol{\dot p}}{_j^{e^*}} $为追击无人艇$i$与逃避无人艇$j$的最优运动策略。

根据式(4)和式(5),追击无人艇$i$与逃避无人艇$j$的HJI函数可表达为:

$ {\left\{\begin{aligned} {H_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) =\ & {{\boldsymbol{p}}_{ei}^{p^{\text{T}}}}{\boldsymbol{p}}_{ei}^p + {{\boldsymbol{\dot p}}_i^{p^{\text{T}}}}{\boldsymbol{\dot p}}_i^p + {{\boldsymbol{\dot {\bar p}}}_i^{{pp}^{\text{T}}}}{\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}_i^{{pe}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pe} +\\ &\nabla {\boldsymbol{V}}_{pi}^{\text{T}}\left( {d_i^p + d_i^{pe}} \right)\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot {\bar p}}}_i^{pp} - {\boldsymbol{\dot {\bar p}}}_i^{pe}} \right),\\ {H_{ej}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) =\ &{\boldsymbol{p}}_{ej}^{e^{\text{T}}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\dot p}}_j^{e^{\text{T}}}{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot {\bar p}}}_j^{{ep}^{\text{T}}}{\boldsymbol{\dot{ \bar p}}}_j^{ep} + {\boldsymbol{\dot {\bar p}}}_j^{{ee}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ee} +\\ &\nabla {\boldsymbol{V}}_{ej}^{\text{T}}\left( {\gamma d_j^e - d_j^{ep}} \right)\left( {{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot{ \bar p}}}_j^{ee} + {\boldsymbol{\dot{\bar p}}}_j^{ep}} \right) 。\end{aligned}\right. }$ (7)

式中:$ {H_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) $$ {H_{ej}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) $分别为追击无人艇$i$与逃避无人艇$j$的HJI函数。$ {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p} \right) $$ {{\boldsymbol{V}}_{ej}}\left( {{\boldsymbol{p}}_{ej}^e} \right) $分别为性能指标(5)的价值函数,具体可以表示为${{\boldsymbol{V}}_{pi}} = \displaystyle\int_s^\infty \times \left( {{\boldsymbol{p}}{{_{ei}^p}^{\text{T}}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\dot p}}{{_i^p}^{\text{T}}}{\boldsymbol{\dot p}}_i^p + {\boldsymbol{\dot {\bar p}}}{{_i^{pp}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pp}} - {\boldsymbol{\dot {\bar p}}}{{_i^{pe}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_i^{pe} \right){\rm{d}}t$${{\boldsymbol{V}}_{ej}} = \displaystyle\int_0^\infty \left( {\boldsymbol{p}}{{_{ej}^e}^{\text{T}}} \right. \left.{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\dot p}}{{_j^e}^{\text{T}}}{\boldsymbol{\dot p}}_j^e - {\boldsymbol{\dot {\bar p}}}{{_j^{ep}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ep} + {\boldsymbol{\dot {\bar p}}}{{_j^{ee}}^{\text{T}}}{\boldsymbol{\dot {\bar p}}}_j^{ee} \right){\rm{d}}t$$ \nabla {{\boldsymbol{V}}_{pi}} = {{\partial {{\boldsymbol{V}}_{pi}}} /{{\boldsymbol{p}}_{ei}^p}} $$ \nabla {{\boldsymbol{V}}_{ej}} = {{{{\boldsymbol{V}}_{ej}}} / {{\boldsymbol{p}}_{ej}^e}} $则分别为$ {{\boldsymbol{V}}_{pi}} $$ {{\boldsymbol{V}}_{ej}} $的负梯度形式。

为了最小化HJI函数$ {H_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) $$ {H_{ej}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) $,追击无人艇$i$与逃避无人艇$j$的最优运动策略应为:

$ \left\{\begin{aligned} &\frac{{\partial {H_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right)}}{{\partial {\boldsymbol{\dot p}}_i^p}} = 0 \Rightarrow {\boldsymbol{\dot p}}_i^{p^*} = - \frac{1}{2}\left( {d_i^p + d_i^{pe}} \right)\nabla {{\boldsymbol{V}}_{pi}},\\ &\frac{{\partial {H_{ej}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right)}}{{\partial {\boldsymbol{\dot p}}_j^e}} = 0 \Rightarrow {\boldsymbol{\dot p}}_j^{e^*} = - \frac{1}{2}\left( {\gamma d_j^e - d_j^{ep}} \right)\nabla {{\boldsymbol{V}}_{ej}}。\\ \end{aligned}\right. $ (8)

式中:$ {\boldsymbol{\dot p}}_i^{p^*},{\boldsymbol{\dot p}}_j^{e^*} $分别为追击无人艇$i$和逃避无人艇$j$的最优运动策略同理,无人艇集群的最优运略分别为$ {\boldsymbol{\dot {\bar p}}}_i^{{pe}^*} = - {{\left( {d_i^p + d_i^{pe}} \right)\nabla {{\boldsymbol{V}}_{pi}}} / 2} $$ {\boldsymbol{\dot {\bar p}}}_i^{{ep}^*} = {{\left( {\gamma d_j^e - d_j^{ep}} \right)\nabla {{\boldsymbol{V}}_{ej}}}/ 2} $

2.2 神经网络在线学习

通过给出价值函数,即可直接求解无人艇的最优运动策略,但无人艇追逃博弈是一个动态过程,不易直接找到对应的价值函数形式。因此利用actor-critic神经网络强化学习的方式对价值函数进行在线逼近,得到如下所示价值函数负梯度:

$ \left\{\begin{aligned} &\nabla {{\boldsymbol{V}}_{pi}} = \frac{2}{{{{\left( {d_i^p + d_i^{pe}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{pi}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{W}}_{pi}^{\text{T}}{{\boldsymbol{S}}_{pi}} + {{\boldsymbol{\varepsilon }}_{pi}}} \right) ,\\ &\nabla {{\boldsymbol{V}}_{ej}} = \frac{2}{{{{\left( {\gamma d_j^e - d_j^{ep}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{ej}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{W}}_{ej}^{\text{T}}{{\boldsymbol{S}}_{ej}} + {{\boldsymbol{\varepsilon }}_{ej}}} \right) 。\\ \end{aligned}\right. $ (9)

式中:$ {{\boldsymbol{K}}_{pi}} = {\text{diag}}\left( {{k_{xi}},{k_{yi}}} \right) $$ {{\boldsymbol{K}}_{ej}} = {\text{diag}}\left( {{k_{xj}},{k_{yj}}} \right) $分别为无人艇$i$与逃避无人艇$j$的控制参数。$ {{\boldsymbol{W}}_{pi}} = \left[ {{{\boldsymbol{W}}_{xi}},{{\boldsymbol{W}}_{yi}}} \right] $$ {{\boldsymbol{W}}_{ej}} = \left[ {{{\boldsymbol{W}}_{xj}},{{\boldsymbol{W}}_{yj}}} \right] $为神经网络权重。$ {{\boldsymbol{S}}_{pi}} $$ {{\boldsymbol{S}}_{ej}} $为神经网络激活函数。$ {{\boldsymbol{\varepsilon }}_{pi}} $$ {{\boldsymbol{\varepsilon }}_{ej}} $为神经网络逼近误差,通常认为是接近于0的有界小量。进一步地,可以得到价值函数的估计值和无人艇最优运动策略的估计值。

$\left\{ \begin{aligned} &\nabla {{{\boldsymbol{\hat V}}}_{pi}} = \frac{2}{{{{\left( {d_i^p + d_i^{pe}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{pi}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\hat W}}_{pci}^{\text{T}}{{\boldsymbol{S}}_{pi}}} \right),\\ &\nabla {{{\boldsymbol{\hat V}}}_{ej}} = \frac{2}{{{{\left( {\gamma d_j^e - d_j^{ep}} \right)}^2}}}\left( {{{\boldsymbol{K}}_{ej}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\hat W}}_{ecj}^{\text{T}}{{\boldsymbol{S}}_{ej}}} \right)。\\ \end{aligned} \right.$ (10)
$ \left\{\begin{aligned} &{\boldsymbol{\dot {\hat p}}}{_i^{p^*}} = - \frac{1}{{d_i^p + d_i^{pe}}}\left( {{{\boldsymbol{K}}_{pi}}{\boldsymbol{p}}_{ei}^p + {\boldsymbol{\hat W}}_{pai}^{\text{T}}{{\boldsymbol{S}}_{pi}}} \right),\\ &{\boldsymbol{\dot{ \hat p}}}{_j^{e^*}} = - \frac{1}{{\gamma d_j^e - d_j^{ep}}}\left( {{{\boldsymbol{K}}_{ej}}{\boldsymbol{p}}_{ej}^e + {\boldsymbol{\hat W}}_{eaj}^{\text{T}}{{\boldsymbol{S}}_{ej}}} \right)。\\ \end{aligned}\right. $ (11)

式中:$ \nabla {{\boldsymbol{\hat V}}_{pi}} $$ \nabla {{\boldsymbol{\hat V}}_{ej}} $分别为$ \nabla {{\boldsymbol{V}}_{pi}} $$ \nabla {{\boldsymbol{V}}_{ej}} $的估计值;$ {{\boldsymbol{\hat W}}_{pci}} = \left[ {{{{\boldsymbol{\hat W}}}_{xci}},\;{{{\boldsymbol{\hat W}}}_{yci}}} \right] $$ {{\boldsymbol{\hat W}}_{ecj}} =\; \left[ {{{{\boldsymbol{\hat W}}}_{xcj}},\;{{{\boldsymbol{\hat W}}}_{ycj}}} \right] $分别为$ {{\boldsymbol{W}}_{pi}} $$ {{\boldsymbol{W}}_{ej}} $的critic估计值;$ {{\boldsymbol{\hat W}}_{pai}} = \left[ {{{{\boldsymbol{\hat W}}}_{xai}},{{{\boldsymbol{\hat W}}}_{yai}}} \right] $$ {{\boldsymbol{\hat W}}_{eaj}} = \left[ {{{\boldsymbol{\hat W}}}_{xaj}},{{{\boldsymbol{\hat W}}}_{yaj}} \right] $分别为$ {{\boldsymbol{W}}_{pi}} $$ {{\boldsymbol{W}}_{ej}} $的actor估计值;同理,$ {{\boldsymbol{W}}_{pi}} $$ {{\boldsymbol{W}}_{ej}} $的critic估计误差分别为$ {{\boldsymbol{\tilde W}}_{pci}} = \left[ {{{{\boldsymbol{\tilde W}}}_{xci}},{{{\boldsymbol{\tilde W}}}_{yci}}} \right] $$ {{\boldsymbol{\tilde W}}_{ecj}} = \left[ {{{\boldsymbol{\tilde W}}}_{xcj}},\right. \left.{{{\boldsymbol{\tilde W}}}_{ycj}} \right] $,actor估计误差分别为$ {{\boldsymbol{\tilde W}}_{pai}} = \left[ {{{{\boldsymbol{\tilde W}}}_{xai}},{{{\boldsymbol{\tilde W}}}_{yai}}} \right],{{\boldsymbol{\tilde W}}_{eaj}} = \left[ {{{{\boldsymbol{\tilde W}}}_{xaj}},{{{\boldsymbol{\tilde W}}}_{yaj}}} \right] $

为最小化HJI函数,神经网络权重估计值将按照如下学习律在线更新:

$ {\left\{\begin{aligned} &{{{\boldsymbol{\dot {\hat W}}}}_{hci}} = - {k_{ci}}\left( {{{\boldsymbol{S}}_{hi}}{\boldsymbol{S}}_{hi}^{\text{T}} + {\sigma _{hi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{hci}},\\ &{{{\boldsymbol{\dot {\hat W}}}}_{hai}} = - \left( {{{\boldsymbol{S}}_{hi}}{\boldsymbol{S}}_{hi}^{\text{T}} + {\sigma _{hi}}{\boldsymbol{I}}} \right) \times \left[ {{k_{ai}}\left( {{{{\boldsymbol{\hat W}}}_{hai}} - {{{\boldsymbol{\hat W}}}_{hci}}} \right) + {k_{ci}}{{{\boldsymbol{\hat W}}}_{hci}}} \right]。\end{aligned}\right. }$ (12)

式中:$h = p,e$$ {k_{ci}} $$ {k_{ai}} $为强化学习率;$ {\sigma _{hi}} $为大于0的常数;$ {\boldsymbol{I}} $为单位矩阵。

至此,无人艇集群追逃博弈制导算法设计结束,为方便理解所提算法的执行流程和基本原理,图1为相关的信号流程框图。可知,无人艇集群(1)是通过1.1中的通信或观测拓扑连接起来的,由此得到追逃博弈运动学误差式(2)和式(4),基于此,设计追逃博弈性能指标式(5),充分展现追逃博弈中无人艇的物理意义,根据式(5)引出的HJI方程(7)和价值函数可以获得最优运动策略式(8),为了确定式(8)的取值,利用actor-critic神经网络对价值函数的负梯度式(9)进行在线逼近,得到最优运动策略的估计值式(11)作为无人艇(1)的运动输入信号,算法的信号流程实现闭环。

图 1 无人艇集群追逃博弈算法信号流程框图 Fig. 1 Signals flowchart of the USVs pursuit-evasion game
2.3 理论可行性分析

基于上述设计,无人艇集群(1)将通过运动策略式(11)和在线学习律式(12)完成追逃博弈过程。为验证所提算法的理论可行性,将对控制系统的收敛性进行分析。

定理1 针对无人艇集群(1),分布式运动策略式(11),在线学习律式(12)组成的闭环控制系统,所有误差信号均收敛到一个紧集,并保证半全局一致最终有界。

选取如下李雅普诺夫方程:

$ \begin{split} L =& \frac{1}{2}{\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{p}}_{ei}^p + \frac{1}{2}{\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{p}}_{ej}^e + \frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pci}^{\text{T}}{\boldsymbol{\tilde W}}_{pci}^{}} \right) +\\ &\frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pai}^{\text{T}}{\boldsymbol{\tilde W}}_{pai}^{}} \right) + \frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{ecj}^{\text{T}}{\boldsymbol{\tilde W}}_{ecj}^{}} \right) +\\ &\frac{1}{2}{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{eaj}^{\text{T}}{\boldsymbol{\tilde W}}_{eaj}^{}} \right) ,\end{split} $ (13)

其导数为:

$ \begin{split} \dot L =& {\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{\dot p}}_{ei}^p + {\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{\dot p}}_{ej}^e + {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pci}^{\text{T}}{\boldsymbol{\dot{ \hat W}}}_{pci}^{}} \right) +{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pai}^{\text{T}}{\boldsymbol{\dot{ \hat W}}}_{pai}^{}} \right) +\\ & {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{ecj}^{\text{T}}{\boldsymbol{\dot {\hat W}}}_{ecj}^{}} \right) + {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{eaj}^{\text{T}}{\boldsymbol{\dot {\hat W}}}_{eaj}^{}} \right) 。\end{split} $ (14)

将式(11)代入式(4)中,可得:

$ \begin{split} {\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{\dot p}}_{ei}^p \leqslant & - \left[ {\lambda _{\min }}\left( {{{\boldsymbol{K}}_{pi}}} \right) - 3 - \sum\limits_{k = 1}^N {\frac{{{a_{ik}}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pk}}} \right)}}{{d_k^p + d_k^{pe}}}} -\right. \\ & \left. \sum\limits_{j = 1}^M {\frac{{{c_{ij}}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pj}}} \right)}}{{\gamma d_j^e + d_j^{ep}}}} \right]{\boldsymbol{p}}_{ei}^{{p^{\text{T}}}}{\boldsymbol{p}}_{ei}^p + \sum\limits_{j = 1}^M {\frac{{{c_{ij}}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pj}}} \right)}}{{\gamma d_j^e + d_j^{ep}}}} \times\\ &{\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{p}}_{ej}^e + \frac{1}{4}{\boldsymbol{\hat W}}_{pai}^{\text{T}}{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}}{\boldsymbol{\hat W}}_{pai}^{} + \sum\limits_{k = 1}^N {\frac{{a_{ik}^{}{\lambda _{\max }}\left( {{{\boldsymbol{K}}_{pk}}} \right)}}{{4\left( {d_k^p + d_k^{pe}} \right)}}} \times\\ &{\boldsymbol{p}}_{ek}^{{p^{\text{T}}}}{\boldsymbol{p}}_{ek}^p + \sum\limits_{j = 1}^M {\frac{{c_{ij}^2{\boldsymbol{\hat W}}_{paj}^{\text{T}}{{\boldsymbol{S}}_{pj}}{\boldsymbol{S}}_{pj}^{\text{T}}{\boldsymbol{\hat W}}_{paj}^{}}}{{4{{\left( {\gamma d_j^e + d_j^{ep}} \right)}^2}}}} + \\ &\sum\limits_{k = 1}^N {\frac{{a_{ik}^2{\boldsymbol{\hat W}}_{pak}^{\text{T}}{{\boldsymbol{S}}_{pk}}{\boldsymbol{S}}_{pk}^{\text{T}}{\boldsymbol{\hat W}}_{pak}^{}}}{{4{{\left( {d_k^p + d_k^{pe}} \right)}^2}}}}。\\[-5pt] \end{split} $ (15)

根据式(12),可以得到如下等式:

$ \begin{split} {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{pci}^{\text{T}}{{{\boldsymbol{\dot {\hat W}}}}_{pci}}} \right) =& \sum\limits_{\iota = x,y} {\left\{ { - \frac{{{k_{ci}}}}{2}{\boldsymbol{\tilde W}}_{\iota ci}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\tilde W}}}_{\iota ci}}} \right.} -\\ &\frac{{{k_{ci}}}}{2}{\boldsymbol{\hat W}}_{\iota ci}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota ci}} -\\ &\left. { \frac{{{k_{ci}}}}{2}{\boldsymbol{W}}_{\iota i}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{\boldsymbol{W}}_{\iota i}}} \right\} ,\end{split} $ (16)
$ \begin{split} {\text{tr}}\left( {{\boldsymbol{\tilde W}}_{{pai}^{\text{T}}}{{{\boldsymbol{\dot {\hat W}}}}_{pai}}} \right) = & \sum\limits_{\iota = x,y} {\left\{ { - \frac{{{k_{ci}}}}{2}{\boldsymbol{\tilde W}}_{\iota ai}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\tilde W}}}_{\iota ai}}} \right.} -\\ & \frac{{{k_{ai}}}}{2}{\boldsymbol{\hat W}}_{\iota ai}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota ai}} -\\ & \frac{{{k_{ai}}}}{2}{\boldsymbol{W}}_{\iota i}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota i}}- \\ & \left. { \frac{{{k_{ai}} - {k_{ci}}}}{2}{\boldsymbol{\hat W}}_{\iota ci}^{\text{T}}\left( {{{\boldsymbol{S}}_{pi}}{\boldsymbol{S}}_{pi}^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right){{{\boldsymbol{\hat W}}}_{\iota ci}}} \right\} 。\end{split} $ (17)

同理可以为$ {\boldsymbol{p}}_{ej}^{{e^{\text{T}}}}{\boldsymbol{\dot p}}_{ej}^e,{\text{tr}}\left( {{\boldsymbol{\tilde W}}_{ecj}^{\text{T}}{\boldsymbol{\dot {\hat W}}}_{ecj}} \right){\text{,tr}}\left( {{\boldsymbol{\tilde W}}_{eaj}^{\text{T}}{\boldsymbol{\dot {\hat W}}}_{eaj}} \right) $获得类似的结果。进而可以得到$\dot L \leqslant - 2\kappa L + \varrho $,即控制系统满足半全局一致最终有界稳定。由此,定理1得证。

定理2 针对在线学习律式(12),其总能使HJB函数的估计值$ {H_{pi}}\left( {{\boldsymbol{\dot {\hat p}}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right),{H_{ej}}\left( {{\boldsymbol{\dot {\hat p}}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) $趋近于0。

将价值函数估计值式(10)与运动策略估计值式(11)代入HJI函数式(7),得到$ {H_{pi}}\left( {{\boldsymbol{\dot {\hat p}}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) $$ {H_{ej}}\left( {{\boldsymbol{\dot {\hat p}}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) $。对$ {H_{pi}}\left( {{\boldsymbol{\dot {\hat p}}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) $求关于$ {{\boldsymbol{\hat W}}_{pai}} $的偏导数,可得:

$ \frac{{\partial {H_{pi}}\left( {{\boldsymbol{\dot{ \hat p}}}_i^p,{\boldsymbol{p}}_{ei}^p} \right)}}{{\partial {{{\boldsymbol{\hat W}}}_{pai}}}} = \frac{{2{\boldsymbol{S}}_{pi}^{}{\boldsymbol{S}}_{pi}^{\text{T}}\left( {{{{\boldsymbol{\hat W}}}_{pai}} - {{{\boldsymbol{\hat W}}}_{pci}}} \right)}}{{{{\left( {d_k^p + d_k^{pe}} \right)}^2}}}。$ (18)

定义${Q_{pi}} = {{{{\left( {{{{\boldsymbol{\hat W}}}_{pai}} - {{{\boldsymbol{\hat W}}}_{pci}}} \right)}^{\text{T}}}\left( {{{{\boldsymbol{\hat W}}}_{pai}} - {{{\boldsymbol{\hat W}}}_{pci}}} \right)}/ 2}$,根据文献[13]中的理论,当${Q_{pi}} = 0$时,将存在${\partial {H_{pi}}\left( {{\boldsymbol{\dot {\hat p}}}_i^p,{\boldsymbol{\dot{ \bar p}}}_i^{pe}} \right)} / {\partial {{{\boldsymbol{\hat W}}}_{pai}}} = 0 。$。对${Q_{pi}}$求导,可得:

$ {\dot Q_{pi}} = - {k_{ai}}\frac{{\partial {Q_{pi}}}}{{\partial {\boldsymbol{\hat W}}_{pai}^{\text{T}}}}\left( {{{\boldsymbol{S}}_i}{\boldsymbol{S}}_i^{\text{T}} + {\sigma _{pi}}{\boldsymbol{I}}} \right)\frac{{\partial {Q_{pi}}}}{{\partial {{{\boldsymbol{\hat W}}}_{pai}}}} \leqslant 0。$ (19)

式中:当时间趋于无穷时,${Q_{pi}}$将趋于0,进而${{\partial {H_{pi}}\left( {{\boldsymbol{\dot {\hat p}}}_i^p,{\boldsymbol{\dot{ \bar p}}}_i^{pe}} \right)} \mathord{\left/ {\vphantom {{\partial {H_{pi}}\left( {{\boldsymbol{\dot{ \hat p}}}_i^p,{\boldsymbol{\dot{ \bar p}}}_i^{pe}} \right)} {\partial {{{\boldsymbol{\hat W}}}_{pai}}}}} \right. } {\partial {{{\boldsymbol{\hat W}}}_{pai}}}}$$ {H_{pi}}\left( {{\boldsymbol{\dot {\hat p}}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) $将收敛与0。同样易得$ {H_{ej}}\left( {{\boldsymbol{\dot {\hat p}}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) \to 0 $。由此,定理2得证。

定理3 在定理1和定理2的基础上,针追逃博弈性能指标式(5),分布式运动策略式(11)总能保证其满足纳什均衡,即总是满足$ {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^{p*},{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right)\leqslant $$ {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^{p*},{\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right) $$ \leqslant {J}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe*}\right) $

考虑$ {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( \infty \right)} \right) = 0 $,性能指标$ {J_{pi}}\left( {{\boldsymbol{\dot p}}_j^e,{\boldsymbol{\dot {\bar p}}}_j^{ep}} \right) $可以表达为:

$ {J}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right) ={\displaystyle {\int }_{0}^{\infty }{H}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right)}{\rm{d}}t+{V}_{pi}\left({p}_{ei}^{p}\left(0\right)\right)。$ (20)

其中,$ {H}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right) $可以被改写为:

$ \begin{split}{H}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe}\right)=&{H}_{pi}\left({\dot{p}}_{i}^{p*},{\dot{\overline{p}}}_{i}^{pe*}\right)+2{\dot{p}}_{i}^{p*}\left({\dot{p}}_{i}^{p}-{\dot{p}}_{i}^{p*}\right) +\\ &{\left({\dot{p}}_{i}^{p} - {\dot{p}}_{i}^{p*}\right)}^{\text{T}} \left({\dot{p}}_{i}^{p} - {\dot{p}}_{i}^{p*}\right) - {\left({\dot{\overline{p}}}_{i}^{pe} - {\dot{\overline{p}}}_{i}^{pe*}\right)}^{\text{T}} \times \\ &\left({\dot{\overline{p}}}_{i}^{pe}-{\dot{\overline{p}}}_{i}^{pe*}\right)-2{\dot{\overline{p}}}_{i}^{pe*}\left({\dot{\overline{p}}}_{i}^{pe}-{\dot{\overline{p}}}_{i}^{pe*}\right)-\\ &\nabla {V}_{pi}^{\text{T}}\left({d}_{i}+{d}_{i}^{pe}\right) \times\left({\dot{p}}_{i}^{p*}-{\dot{p}}_{i}^{p}\right)+\\ &\nabla {V}_{pi}^{\text{T}}\left({d}_{i}+{d}_{i}^{pe}\right)\left({\dot{\overline{p}}}_{i}^{pe*}-{\dot{\overline{p}}}_{i}^{pe}\right),\end{split} $ (21)

进一步,将式(21)代入式(20),可得:

${ \begin{split} {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = & \int_0^\infty \Big[{{{\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)}^{\text{T}}}\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)} +2{\boldsymbol{\dot p}}_i^{p*}\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right) -\\ & 2{\boldsymbol{\dot {\bar p}}}_i^{pe*} \times \left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot{ \bar p}}}_i^{pe*}} \right) - {\left( {{\boldsymbol{\dot{ \bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right)^{\text{T}}} \times\\ &\left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right) - \nabla {\boldsymbol{V}}_{pi}^{\text{T}}\left( {{d_i} + d_i^{pe}} \right) \times \left( {{\boldsymbol{\dot p}}_i^{p*} - {\boldsymbol{\dot p}}_i^p} \right) +\\ & \nabla {\boldsymbol{V}}_{pi}^{\text{T}}\left( {{d_i} + d_i^{pe}} \right) \times \left( {{\boldsymbol{\dot{ \bar p}}}_i^{pe*} - {\boldsymbol{\dot {\bar p}}}_i^{pe}} \right)\Big]{\rm{d}}t + {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right) 。\\[-12pt] \end{split} }$ (22)

$ {\boldsymbol{\dot p}}_i^p = {\boldsymbol{\dot p}}_i^{p*} $$ {\boldsymbol{\dot {\bar p}}}_i^{pe} = {\boldsymbol{\dot {\bar p}}}_i^{pe*} $时,易得:

$ {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right),$ (23)

$ {\boldsymbol{\dot p}}_i^p = {\boldsymbol{\dot p}}_i^{p*} $$ {\boldsymbol{\dot{ \bar p}}}_i^{pe} \ne {\boldsymbol{\dot {\bar p}}}_i^{pe*} $时,同时考虑到$ \nabla {\boldsymbol{V}}_{pi}^{\text{T}}( {d_i} + d_i^{pe} ) = - 2{\boldsymbol{\dot {\bar p}}}_i^{pe*} $,易得:

$ \begin{split} {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = &- \int_0^\infty {{{\left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right)}^{\text{T}}}\left( {{\boldsymbol{\dot {\bar p}}}_i^{pe} - {\boldsymbol{\dot {\bar p}}}_i^{pe*}} \right)} {\rm{d}}t +\\ &{{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right) 。\end{split} $ (24)

$ {\boldsymbol{\dot p}}_i^p \ne {\boldsymbol{\dot p}}_i^{p*} $$ {\boldsymbol{\dot {\bar p}}}_i^{pe} = {\boldsymbol{\dot{ \bar p}}}_i^{pe*} $时,同时考虑到$ \nabla {\boldsymbol{V}}_{pi}^{\text{T}}( {d_i} + d_i^{pe} ) = - 2{\boldsymbol{\dot p}}_i^{p*} $,易得:

$ {{J_{pi}}\left( {{\boldsymbol{\dot p}}_i^p,{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) = \int_0^\infty {{{\left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)}^{\text{T}}} \left( {{\boldsymbol{\dot p}}_i^p - {\boldsymbol{\dot p}}_i^{p*}} \right)}{\rm{d}}t + {{\boldsymbol{V}}_{pi}}\left( {{\boldsymbol{p}}_{ei}^p\left( 0 \right)} \right) 。} $ (25)

根据式(23)~式(25)可以证得$ {J_{pi}}\left( {{\boldsymbol{\dot p}}_i^{p*},{\boldsymbol{\dot {\bar p}}}_i^{pe}} \right) \leqslant {J_{pi}}( {\boldsymbol{\dot p}}_i^{p*}, {\boldsymbol{\dot {\bar p}}}_i^{pe*} ) \leqslant {J}_{pi}\left({\dot{p}}_{i}^{p},{\dot{\overline{p}}}_{i}^{pe*}\right) $。由此,定理3得证。

3 仿真实验

本仿真验证实验中考虑3艘追击无人艇和3艘逃避无人艇,其拓扑图结构如图2所示。无人艇的初始位置分别为$p_1^p = \left[ {10; - 2} \right],p_2^p = \left[ {0;0} \right]$$p_3^p$$ = \left[ {20; - 3} \right],p_1^e = \left[ {3;5} \right],p_2^e = \left[ {2;10} \right]$$p_3^e = \left[ {20;5} \right]$。控制参数设为${{\boldsymbol{K}}_{p1}} = {{\boldsymbol{K}}_{p2}} = {{\boldsymbol{K}}_{p3}} = {\text{diag}}\left( {1,1} \right),{{\boldsymbol{K}}_{e1}} = {{\boldsymbol{K}}_{e2}}$$ = {{\boldsymbol{K}}_{e3}} = {\text{diag}}\left( {0.5,0.5} \right)$。学习率设为$ {k_{kci}} = 1 $$ {k_{kai}} = $$ 0.6 $。为了展示逃避无人艇的摆脱堵截和保持队形的优先级选择,选择了$k = 0.2$$k = 2$时2种情况进行实验,具体实验结果如图3图4所示。

图 2 无人艇集群拓扑图结构 Fig. 2 The topological graph of multiple USVs

图 3 $\gamma = 0.2$时追逃博弈结果 Fig. 3 The results of pursuit-evasion game when $\gamma = 0.2$

图 4 $\gamma = 2$时追逃博弈结果 Fig. 4 The results of pursuit-evasion game when $\gamma = 2$

图3(a)和图4(a)为无人艇集群追逃博弈轨迹结果,可以看出,越小的$k$意味着逃避无人艇集群聚在一起的优先级越低,同时逃避无人艇集群聚在一起需要的时间越长。但无论逃避无人艇集群优先级如何,追击无人艇集群都能实现对其的追击和堵截。图3(b)和图4(b)为无人艇集群的速度,可以发现,无人艇集群的速度总是趋于0,这表明在追逃博弈中没有任一逃避无人艇完成逃逸。

4 结 语

本文研究了无人艇集群的追逃博弈问题。通过图论分别对逃避无人艇集群和追击无人艇集群建立了逃逸运动学模型和追击运动学模型,基于追逃运动模型将追逃博弈问题转化为微分博弈求解问题。求解微分博弈问题中运用了极小值-极大值策略和强化学习技术,实时得到无人艇最佳运动策略。通过仿真实验验证了追击无人艇能够有效对逃避无人艇进行堵截,证明了本文所提追逃博弈制导算法的有效性。

在实际应用中,该制导算法的基础技术是多船制导信号协同,关键技术是对敌对无人艇的探测识别。首先通过激光雷达、视觉识别等传感器件探测并锁定敌对无人艇,将敌对无人艇的位置、速度信息传递给船载工控主机,由船载工控主机实时计算己方无人艇的运动策略,再由无人艇底层控制算法为己方无人艇提供相应的动力。考虑到每条无人艇的船载工控主机都能实时计算整个己方无人艇集群的运动策略,因此己方无人艇集群不再需要传统意义上的通信连接也能够保持协同,避免了由于船间距过大而引起的通信连接中断问题。

参考文献
[1]
NING X, ZHANG H T, ZHU L J. Prescribed-time collective evader-capturing for autonomous surface vehicles[J]. Automatica, 2024, 167: 111761. DOI:10.1016/j.automatica.2024.111761
[2]
LOPEZ V, LEWIS F, WAN Y, et al. Solutions for multiagent pursuit-evasion games on communication graphs: finite-time capture and asymptotic behaviors[J]. IEEE Transactions on Automatic Control, 2020, 65(5): 1911-1923. DOI:10.1109/TAC.2019.2926554
[3]
庞樨, 杨神化, 陈国权, 等. 基于扩展式博弈的多船协商避碰研究[J]. 舰船科学技术, 2025, 47(1): 76-82.
PANG X, YANG S H, CHEN G Q, et al. Research on multi-ship negotiation collision avoidance based on extensive game model[J]. Ship Science and Technology, 2025, 47(1): 76-82.
[4]
于长东, 刘新阳, 陈聪, 等. 基于多智能体深度强化学习的无人艇集群博弈对抗研究[J]. 水下无人系统学报, 2024, 32(1): 79-86. DOI:10.11993/j.issn.2096-3920.2023-0159
[5]
刘鹏, 赵建新, 张宏映, 等. 基于改进型MADDPG的多智能体对抗策略算法[J]. 火力与指挥控制, 2023, 48(3): 132-138,145. DOI:10.3969/j.issn.1002-0640.2023.03.020
[6]
HUA X, LIU J X, ZHANG J J, et al. An apollonius circle based game theory and Q-learning for cooperative hunting in unmanned aerial vehicle cluster[J]. Computers and Electrical Engineering, 2023, 110: 108876. DOI:10.1016/j.compeleceng.2023.108876
[7]
程代展, 付世华. 博弈控制论简述[J]. 控制理论与应用, 2018, 35(5): 588-592. DOI:10.7641/CTA.2017.60952
[8]
VAMVOUDAKIS K, LEWIS F. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem[J]. Automatica, 2010, 64: 878-888.
[9]
LONG J, YU D X, WEN G X, et al. Game-based backstepping design for strict-feedback nonlinear multi-agent systems based on reinforcement learning[J]. IEEE Transactions on Neural Networks And Learning Systems, 2024, 35(1): 817-830. DOI:10.1109/TNNLS.2022.3177461
[10]
VAMVOUDAKIS K, LEWIS F. Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations[J]. Automatica, 2011, 47: 1556-1569. DOI:10.1016/j.automatica.2011.03.005
[11]
MAZOUCHI M, NAGHIBI-SISTANI M B, SANI S K H. Novel distributed optimal adaptive control algorithm for nonlinear multi-agent differential graphical games[J]. IEEE/CAA Journal of automatica sinica, 2018, 5(1): 331-341. DOI:10.1109/JAS.2017.7510784
[12]
初庆栋, 尹羿博, 龚小旋, 等. 基于双偶极向量场的欠驱动无人船目标跟踪制导方法[J]. 中国舰船研究, 2022, 17(4): 32-37.
[13]
WEN G X, CHEN C. L. P. , GE S. S. Simplified optimized backstepping control for a class of nonlinear strict-feedback systems with unknown dynamic functions [J]. IEEE Transactions on Cybernetics, 2021, 51(9): 4567−4580.