An Iterative Method for Optimal Feedback Control and Generalized HJB Equation
  IEEE/CAA Journal of Automatica Sinica  2018, Vol. 5 Issue(5): 999-1006   PDF    
An Iterative Method for Optimal Feedback Control and Generalized HJB Equation
Xuesong Chen1, Xin Chen2     
1. School of Applied Mathematics, Guangdong University of Technology, Guangzhou 510006, China;
2. School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China
Abstract: In this paper, a new iterative method is proposed to solve the generalized Hamilton-Jacobi-Bellman (GHJB) equation through successively approximate it. Firstly, the GHJB equation is converted to an algebraic equation with the vector norm, which is essentially a set of simultaneous nonlinear equations in the case of dynamic systems. Then, the proposed algorithm solves GHJB equation numerically for points near the origin by considering the linearization of the non-linear equations under a good initial control guess. Finally, the procedure is proved to converge to the optimal stabilizing solution with respect to the iteration variable. In addition, it is shown that the result is a closed-loop control based on this iterative approach. Illustrative examples show that the update control laws will converge to optimal control for nonlinear systems.
Key words: Generalized Hamilton-Jacobi-Bellman (HJB) equation     iterative method     nonlinear dynamic system     optimal control    

The optimal control of nonlinear systems is one of the most challenging and difficult subjects in control theory. A large number of theoretical results about the nonlinear optimal control problems have been reported in the past few decades [1]-[9]. The dynamic programming algorithm is widely regarded as the most comprehensive method in finding optimal feedback controllers for generic nonlinear systems. However, the main drawback of dynamic programming methods today is the computational complexity required to describe the value function which grows exponentially with the dimension of its domain. It is well known that continuous-time nonlinear optimal control problem depends on the solution of Hamilton-Jacobi-Bellman (HJB) equation, which is a nonlinear partial differential equation (PDE). Even in simple cases, the HJB equation may not have global analytic solutions. Various methods have been proposed in the literature for computing numerical solutions to the HJB equation, see for example Aguilar et al. [10], Cacace et al. [11], Govindarajan et al. [12], Markman et al. [13], Sakamoto et al. [8], Smears et al. [14], and the references therein.

It is known that the generalized Hamilton-Jacobi-Bellman (GHJB) equation is linear and easier to solve than HJB equation, but no general solution for GHJB equation is demonstrated as to the best knowledge of authors. Beard et al.[15] presented a series of polynomial functions as basic functions to solve the approximate GHJB equation, however, this method requires the computation of a larger number of integrals. Galerkin's spectral approximation approach is proposed in [16] to find approximate but close solutions to the GHJB at each iteration step. The reader is also referred to Markman et al. [13], Smears et al. [14], Saridis et al. [17], Aguilar et al. [10], Gong et al. [18] for more details and different perspectives. Although many articles have discussed the solution to the HJB equation for the continuous-time systems, currently there is very minimal work available on iterative solution approach for GHJB equation.

In this paper, we propose a new iterative method to find the approximate solution to GHJB equation which is associated with the optimal feedback control for nonlinear systems. The idea of this iterative algorithm for GHJB equation is based on Beard's work [16]. Our approach is designed to obtain the general computational solution for the GHJB equation. We firstly convert the GHJB equation to a simple algebraic equation with vector norm, which is essentially a set of nonlinear equations. Then we propose the procedure of this method to compute the solution to the GHJB by considering the linearization of the nonlinear equations under a good initial control guess. The stability and convergence results of the proposed scheme are proved.

The paper is organized as follows. The problem description is presented in Section Ⅱ. The main result of this paper is derived in Section Ⅲ, i.e., the iterative algorithm for the GHJB equation and the detailed mathematical proofs and justifications of the proposed approach. The numerical example is provided in Section Ⅳ. Finally, a brief conclusion is given in Section Ⅴ.


Consider the following continuous-time affine nonlinear system:

$ \begin{equation}\label{eq2-1} \left\{\begin{array}{ll} \dot{x}=f(x)+g(x)u\\ x(0)=x_{0} \end{array}\right. \end{equation} $ (1)

where $x\in\Omega\subset \mathbb{R}^{n}$ is the system state, $f(0)=0$, $x_{0}\in \Omega\subset \mathbb{R}^{n}$ is the initial state and $u\in \mathbb{R}^{m}$ is the control input, and $f:\Omega\rightarrow \mathbb{R}^{n}, g:\Omega\rightarrow \mathbb{R}^{n\times m}, u:\Omega\rightarrow \mathbb{R}^{m}$. Assume that the system is Lipschitz continuous on a set $\Omega$ that contains the origin.

The optimal control problem under consideration is to find a state feedback control law $u(x)=u^{*}(x)$ such that the system (1) is closed-loop asymptotically stable, and the following infinite horizon cost function is minimized:

$ \begin{equation} \label{eq2-2} J(x_{0}, u)=\int^{\infty}_{0}l(x(t, x_{0}))+\| u(x(t, x_{0}))\|^{2}dt \end{equation} $ (2)

where $l:\Omega\rightarrow \mathbb{R}$ is called the state penalty function, and it is a positive-definite function on $\Omega$. Typically, $l$ is a quadratic weighting of the state, i.e. $l=x^{T}Qx$ where $Q$ is a positive-definite matrix.

It is well known that the optimal control can be directly found to be $u^{*}=-\frac{1}{2}g^{T}\frac{\partial V}{\partial x}$, where $V$ satisfies the HJB equation

$ \begin{equation} \label{eq2-3} \frac{\partial V^{T}}{\partial x}f(x)+l(x)-\frac{1}{4}\frac{\partial V^{T}}{\partial x}g(x)(\frac{\partial V^{T}}{\partial x}g(x))^{T}=0. \end{equation} $ (3)

Although the solution to the nonlinear optimal problem has been well-known since the early 1960s in [3], relatively few control designs explicitly use a feedback function of the form given in (3). The primary difficulty lies in solving the HJB equation, for which general closed-form solutions do not exist. It is noted that the HJB equation is a nonlinear PDE that is impossible to be solved analytically. To obtain its approximate solution, in Saridis [17], the HJB equation was successively approximated by a GHJB equation, written as GHJB$(V, u)=0$, if

$ \begin{equation} \label{eq2-4} \frac{\partial V^{T}_{i}}{\partial x}(f+gu^{(i)})+l+\| u^{(i)}\|^{2}=0 \end{equation} $ (4)


$ \begin{equation} \label{eq2-5} u^{(i+1)}(x)=-\frac{1}{2}g^{T}\frac{\partial V_{i}}{\partial x}. \end{equation} $ (5)

The cost of $u^{(i+1)}$ is given by the solution of equation GHJB$(V_{i}, u^{(i)})=0$. Since the GHJB equation is linear, it is theoretically easier to solve than the nonlinear HJB equation.


In many numerical methods for HJB equations, which are typically solved backward in time, the discretization is based on spatial causality and the computation is explicit in time. The value of the solution function $V$ at a grid point is computed at an earlier time using the known value of the function at neighboring grid points at a later time. This coupling usually comes from the discretization of the spatial derivatives. In order to mitigate the curse of dimensionality, a new iterative method using a simple algebraic equation with vector norm is proposed.

A. Iterative Method for the GHJB Equation

It is obvious that the GHJB equation (4) can be rewritten as

$ \begin{equation} \label{eq3-1} \frac{\partial V^{T}_{i}}{\partial x}\| f+gu^{(i)}\|^{2}=-(l+\| u^{(i)}\|^{2})(f+gu^{(i)})^{T}. \end{equation} $ (6)

Let $p^{(i)}=\frac{\partial V^{T}_{i}}{\partial x}=(p^{(i)}_1, p^{(i)}_2, \ldots, p^{(i)}_n)$, if $\| f+gu^{(i)}\| \neq 0$, then (6) and (5) can be represented as a simple algebraic equation

$ \begin{equation} \label{eq3-2} p^{(i)}=-\frac{l+\| u^{(i)}\|^{2}}{\| f+gu^{(i)}\|^{2}}(f+gu^{(i)})^{T} \end{equation} $ (7)


$ \begin{equation} \label{eq3-3} u^{(i+1)}=-\frac{1}{2}g^{T}(p^{(i)})^{T} \end{equation} $ (8)

where $f=(f_1, f_2, \ldots, f_n)^{T}$, $u^{(i)}=(u^{(i)}_1, u^{(i)}_2, \ldots, u^{(i)}_m)^{T}$, and

$ g=\left( \begin{matrix} {{g}_{11}} & {{g}_{12}} & \cdots & {{g}_{1m}} \\ {{g}_{21}} & {{g}_{22}} & \cdots & {{g}_{2m}} \\ \vdots & \vdots & \ddots & \vdots \\ {{g}_{n1}} & {{g}_{n2}} & \cdots & {{g}_{nm}} \\ \end{matrix} \right). $ (9)

Let $F^{(i)}=(f+gu^{(i)})^{T}$, define the norm of $x=$ $(x_1, x_2, \ldots, x_n)^{T}$ as follows

$ \|x{{\|}^{2}}=\sum\limits_{i=1}^{n}{x_{i}^{2}} $

then (7) can be rewritten as

$ \begin{eqnarray} p^{(i)}&{}={}&-\frac{l+\sum\limits_{k=1}^{m}(u^{(i)}_{k})^{2}}{\| F^{(i)}\|^{2}}(F^{(i)}) \nonumber\\ &{}={}&-\frac{l+\sum\limits_{k=1}^{m}(u^{(i)}_{k})^{2}}{\sum\limits^{n}_{j=1}(F^{(i)}_j)^{2}}(F^{(i)}) \end{eqnarray} $ (10)


$ \left\{ \begin{array}{l} \dot{x}^{(i)}_{1}=F^{(i)}_1=f_1+g_{11}u^{(i)}_1+g_{12}u^{(i)}_2+\cdots+g_{1m}u^{(i)}_m\\ \dot{x}^{(i)}_{2}=F^{(i)}_2=f_2+g_{21}u^{(i)}_1+g_{22}u^{(i)}_2+\cdots+g_{2m}u^{(i)}_m\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots \\ \dot{x}^{(i)}_{n}=F^{(i)}_n=f_n+g_{n1}u^{(i)}_1+g_{n2}u^{(i)}_2+\cdots+g_{nm}u^{(i)}_m. \end{array} \right. $ (11)

Then, we obtain the feedback control from (8)

$ \left\{ \begin{array}{l} u^{(i+1)}_1=-\frac{1}{2}(g_{11}p^{(i)}_1+g_{21}p^{(i)}_2+\cdots+g_{n1}p^{(i)}_n)\\ u^{(i+1)}_2=-\frac{1}{2}(g_{12}p^{(i)}_1+g_{22}p^{(i)}_2+\cdots+g_{n2}p^{(i)}_n)\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\vdots \\ u^{(i+1)}_m=-\frac{1}{2}(g_{1m}p^{(i)}_1+g_{2m}p^{(i)}_2+\cdots+g_{nm}p^{(i)}_n). \end{array} \right. $ (12)

The method proposed in this paper may be implemented by applying the following procedure on the system (1).

In Algorithm 1, $p$ is a function of the state $x$, that is to say, $p(x)$ belongs to an infinite-dimensional functional space. The notation of this norm is defined as follows:

$ \| p\|_{p}=\sup\limits_{x\in\Omega}|p(x)|. $
Algorithm 1: Iterative Algorithm Based on GHJB Equation
$u^{{(i)}} :=\mathit{\boldsymbol{{\rm{GHJB}}}}(f, g, l, u^{(1)})$
IF ($\| p^{(i)}-p^{(i-1)}\|_{p}<\epsilon$
        Solve numerically: $ u^{*}=u^{(i)} $
     Restriction: $u^{(i+1)}=-\frac{1}{2}g^{T}(p^{(i)})^{T}$
     Computation: $p^{(i+1)}=-\frac{l+\| u^{(i+1)}\|^{2}}{\| f+gu^{(i+1)}\|^{2}}(f+gu^{(i+1)})$
      Recursion : $i \leftarrow i+1$
RETURN $u^{(i)} $.
B. Initial Control Guess

To apply the method introduced in this paper we must choose an initial stabilizing control $u^{(1)}$ for the system (1) and an estimate of the stability region of $u^{(1)}$. A good estimate for $u^{(1)}$ can be found by considering the linearization of (11).

If $q\in \mathbb{R}^{n}$ is an equilibrium point for system (1), and $F^{(i)}$ is differentiable at $q$, then its derivative is given by $J_{F^{(i)}}(q)$. In this case, the linear map described by $J_{F^{(i)}}(q)$ is the best linear approximation of $F^{(i)}$ near the point $q$, in the sense that

$ \begin{equation}\label{eq3-9} F^{(i)}(x)=F^{(i)}(q)+J_{F^{(i)}}(q)(x-q)+o(\| x-q\|) \end{equation} $ (13)

for $x$ close to $q$ and where $o$ is the little $o$-notation (for $x\rightarrow q$) and $\| x-q\|$ is the distance between $x$ and $q$.

C. Stability Analysis

In this subsection, we conduct the stability and convergence analysis of the proposed scheme. First, we define the pre-Hamiltonian function for some control $u(x)$ and an associate function $V(x)$

$ \begin{equation} \label{eq3-11} H(x, \frac{\partial V}{\partial x}, u)=l(x)+\| u\|^{2}+\frac{\partial V^{T}}{\partial x}(f(x)+g(x)u). \end{equation} $ (14)

Lemma 1: The optimal control law $u^{*}$ and the optimal value function $V^{*}(x)$, if they exist, satisfy (4) and

$ 0<V^{*}(x)\leq V(x), ~~ u\neq u^{*}. $

This lemma is proved in [17] which is a sufficient condition for the optimal control solution to the nonlinear systems.

There is a bulk of literature devoted to the problem of designing stable regulators for nonlinear system. The most important and popular tool is Lyapunov's method. To use Lyapunov's method, the designer first proposes a control and then tries to find a Lyapunov function for the closed-loop system. A Lyapunov function is a generalized energy function of the states, and is usually suggested by the physics of the problem. It is often possible to find a stabilizing control for a particular system. Now we show that value function $V_{1}$ is a Lyapunov function for the system $(f, g, u^{(2)})$.

Theorem 1: Assume $\Omega$ is compact set, $u^{(1)}\in A_{u}(\Omega)$ is an arbitrary admissible control for the system (1) on $\Omega$, if there exists a positive definite continuously differentiable function $V_{1}(x, t)$ on $[t_{0}, \infty)\times \mathbb{R}^{n}$ satisfying the GHJB equation

$ \begin{equation}\label{eq3-12} \frac{\partial V_{1}^{T}}{\partial x}(f+gu^{(1)})+l+\| u^{(1)}\|^{2}=0 \end{equation} $ (15)

associated with

$ u^{(2)}=-\frac{1}{2}g^{T}\frac{\partial V_{1}}{\partial x} $

then for all $t\in [t_{0}, \infty)$, system (1) has bounded trajectories over $[t_{0}, \infty)$, and the origin is a stable equilibrium point for system (1).

Proof: Since $V_{1}(x, t)$ is a continuously differentiable function, we take the derivative of $V_{1}(x, t)$ along the system $(f, g, u^{(2)})$ and the fact that $\frac{\partial V^{T}_{1}}{\partial x}g=-2(u^{(2)})^{T}$ follows

$ \begin{eqnarray} \label{eq3-13} \dot{V}_{1}(x, t)&{}={}&\frac{\partial V_{1}}{\partial t}+\frac{\partial V^{T}_{1}}{\partial x}\dot{x} \nonumber\\ &{}={}&\frac{\partial V_{1}}{\partial t}+\frac{\partial V^{T}_{1}}{\partial x}(f+gu^{(2)}) \nonumber\\ &{}={}&\frac{\partial V_{1}}{\partial t}+\frac{\partial V^{T}_{1}}{\partial x}f+\frac{\partial V^{T}_{1}}{\partial x}gu^{(2)} \nonumber\\ &{}={}&\frac{\partial V_{1}}{\partial t}+\frac{\partial V^{T}_{1}}{\partial x}f-2(u^{(2)})^{T}u^{(2)} \nonumber \\ &{}={}&\frac{\partial V_{1}}{\partial t}+\frac{\partial V^{T}_{1}}{\partial x}f-2\| u^{(2)}\|^{2}. \end{eqnarray} $ (16)

When $t\rightarrow\infty, \frac{\partial V_{1}}{\partial t}=0$, we obtain the following equation from GHJB equation

$ \begin{equation}\label{eq3-14} \frac{\partial V_{1}^{T}}{\partial x}f=-l-\| u^{(1)}\|^{2}-\frac{\partial V_{1}^{T}}{\partial x}gu^{(1)}. \end{equation} $ (17)

Substituting (17) into (16), we can get

$ \begin{eqnarray} \dot{V}_{1}(x, t)&{}={}&-l-\| u^{(1)}\|^{2}-\frac{\partial V^{T}_{1}}{\partial x}gu^{(1)}-2\| u^{(2)}\|^{2} \nonumber \\ &{}={}&-l-\| u^{(1)}\|^{2}+2(u^{(2)})^{T}gu^{(1)}-2\| u^{(2)}\|^{2} \nonumber\\ &{}={}&-l-(u^{(2)})^{T}u^{(2)}-(u^{(1)}-u^{(2)})^{T}(u^{(1)}-u^{(2)}) \nonumber\\ &{}={}&-l-\| u^{(2)}\|^{2}-\| u^{(1)}-u^{(2)}\|^{2}\nonumber\\ &{}\leq{}&0. \end{eqnarray} $ (18)

This establishes the boundaries of the trajectories of system (1) over $[t_{0}, \infty)$. Under the assumptions of the theorem, $\dot{V}_{1}(x, t)<0$ is a Lyapunov function, and the origin is a stable equilibrium point for system (1).

D. Convergence Results

In this subsection, the Algorithm 1 converges to the optimal control will be shown when it exists. It is clear that the following equation is easily obtained from (7) and (8)

$ \begin{equation} \label{eq3-15} u^{(i+1)}=\frac{1}{2}g^{T}\frac{l+\| u^{(i)}\|^{2}}{\| f+gu^{(i)}\|^{2}}gu^{(i)}+\frac{1}{2}g^{T}\frac{l+\| u^{(i)}\|^{2}}{\| f+gu^{(i)}\|^{2}}f. \end{equation} $ (19)

Linearizing the equation (19) according to (13), we obtain

$ \begin{equation}\label{eq3-16} u^{(i+1)}=T(u^{(i)})+o(\| u^{(i)}\|) \end{equation} $ (20)

where $T$ is a linear operator. In its general form the classical method of successive approximation applies to equations of the form $u=T(u)$. A solution $u$ to such an equation is said to be a fixed point of the transformation $T$ since $T$ leaves $u$ invariant. To find a fixed point by successive approximation, we begin with an initial trial vector $u^{(1)}$ and compute $u^{(2)}=T(u^{(1)})$. Continuing in this manner iteratively, we compute successive vectors $u^{(i+1)}=T(u^{(i)})$. Under appropriate conditions the sequence $\{u^{(i)}\}$ converges to a solution of the original equation.

Definition 1: Let $S$ be a subset of a normal space $X$ and let $T$ be a transformation mapping $S$ into $S$. Then, $T$ is said to be a contraction mapping if there is $\alpha, 0\leq \alpha<1$ such that $\| T(u^{(1)})-T(u^{(2)})\|\leq \alpha\| u^{(1)}-u^{(2)}\|.$

Note for example that a transformation having $\| T'(u)\|$ $\leq\alpha<1$ on a convex set $S$ is a contraction mapping since, by the mean value inequality, $\| T(u^{(1)})-T(u^{(2)})\| \leq\sup$ $\| u^{(1)}-u^{(2)}\|\leq\alpha \| u^{(1)}-u^{(2)}\|.$

Lemma 2: If $T$ is a contraction mapping on a closed subset $S$ of a Banach space, there is a unique vector $u^{*}\in S$ satisfying $u^{*}=T(u^{*})$. Furthermore, $u^{*}$ can be obtained by the method of successive approximation starting from an arbitrary initial vector in $S$.

Proof: Select an arbitrary element $u^{(1)}\in S$. Define the sequence $\{u^{(i)}\}$ by the formula $u^{(i+1)}=T(u^{(i)})$. Then $\| u^{(i+1)}-u^{(i)}\|~\!\!\!\!=~\!\!\!\!\| T(u^{(i)}) -T(u^{(i-1)})\| \leq \alpha\| u^{(i)}-u^{(i-1)}\|$. Therefore,

$ \| u^{(i)}-u^{(i-1)}\|\leq \alpha^{n-1}\| u^{(2)}-u^{(1)}\|. $

It follows that

$ \begin{eqnarray} &{}{}&\| u^{(i+m)}-u^{(i)}\| \nonumber \\ &{}{}&\leq\, \, \| u^{(i+m)}-u^{(i+m-1)}\|+\| u^{(i+m-1)}-u^{(i+m-2)}\| \nonumber \\ &{}{}&\quad+\cdots+\| u^{(i+1)}-u^{(i)}\| \nonumber \\ &{}{}&\leq (\alpha^{i+m-2}+\alpha^{i+m-3}+\cdots+\alpha^{i-1})\| u^{(2)}-u^{(1)}\| \nonumber\\ &{}{}&\leq (\alpha^{i-1}\sum^{\infty}_{k=0}\alpha^{k})\| u^{(2)}-u^{(1)}\| \nonumber\\ &{}{}&=\frac{\alpha^{i-1}}{1-\alpha}\| u^{(2)}-u^{(1)}\| \end{eqnarray} $ (21)

and hence we conclude that $\{u^{(i)}\}$ is a Cauchy sequence, since $S$ is a closed subset of a complete space, there is an element $u^{*}\in S$ such that $u^{(i)}\rightarrow u^{*}$.

We now show that $u^{*}=T(u^{*})$. We have

$ \begin{align} \| u^{*}-T(u^{*})\| \!\, &=\, \| u^{*}-u^{(i)}+u^{(i)}-T(u^{*})\| \nonumber \\ &\, \leq \, \| u^{*}-u^{(i)} \|+\| u^{(i)}-T(u^{*})\| \nonumber\\ &\, \leq\, \| u^{*}-u^{(i)} \|+\alpha\| u^{(i-1)}-u^{*}\|. \nonumber\\ \end{align} $ (22)

By appropriate choice of $n$ the right-hand side of the above inequality can be made arbitrarily small. Thus $\| u^{*}-T(u^{*})\|\, =0$.

It remains only to show that $u^{*}$ is unique. Assume that $u^{*}$ and $\tilde{u}^{*}$ are fixed points. Then

$ \| u^{*}-\tilde{u}^{*}\|\, =\, \| T(u^{*})- T(\tilde{u}^{*})\|\leq\| u^{*}-\tilde{u}^{*} \|. $

Thus $u^{*}=\tilde{u}^{*}$.

Defining $T(u)=Au+B$, the problem is equivalent to that of finding a fixed point of $T$. Furthermore, the method of successive approximation proposed above for this problem is equivalent to ordinary successive approximation applied to $T$. Thus it is sufficient to show that $T$ is a contraction mapping with respect to some norm on $n$-dimensional space. For the mapping $T$ defined above we have

$ \| T(u)-T(v)\|\, =\, \| (Au+B)-(Av+B)\|\leq\| A \|\cdot\| u-v\|. $

The basic idea of successive approximation and contraction mappings can be modified in several ways to produce convergence theorems for a number of different situations. We consider one such modification below.

Theorem 2: Let $T$ be a continuous mapping from a closed subset $S$ of a Banach space into $S$, and suppose that $T^{n}$ is a contraction mapping for some positive integer $n$. Then, $T$ has a unique fixed point in $S$ which can be found by successive approximation.

Proof: Let $u^{(1)}$ be arbitrarily selected from $S$. Define the sequence $\{u^{(i)}\}$ by

$ u^{(i+1)}=T(u^{(i)}). $

Now since $T^{i}$ is a contraction, it follows by Lemma 1 that the subsequence $\{u^{(ik)}\}$ converges to an element $u^{*}\in S$ which is a fixed point of $T^{i}$. We show that $u^{*}$ is a unique fixed point of $T$.

By the continuity of $T$, the element $T(u^{*})$ can be obtained by applying $T^{i}$ successively to $T(u^{(1)})$. Therefore, we have $u^{*}=\lim_{k\rightarrow\infty} T^{ik}(u^{(1)})$ and

$ T(u^{*})=T[\lim\limits_{k\rightarrow\infty} T^{ik}(u^{(1)})]=\lim\limits_{k\rightarrow\infty} T^{ik}[T(u^{(1)})]. $

Hence, again using the continuity of $T$,

$ \begin{eqnarray} \| u^{*}&{}-{}&T(u^{*})\| =\lim\limits_{k\rightarrow\infty} \| T^{nk}(u^{(1)})-T^{ik}[T(u^{(1)})]\| \nonumber \\ &{}\leq{}& \lim\limits_{k\rightarrow\infty} \| T^{i}{T^{i(k-1)}(u^{(1)}-T^{i(k-1)}[T(u^{(1)})])}\| \nonumber\\ &{}\leq{}&\alpha \lim\limits_{k\rightarrow\infty}\| T^{i(k-1)}(u^{(1)}-T^{i(k-1)}[T(u^{(1)})])\| \nonumber\\ &{}={}&\alpha \| u^{*}-T(u^{*})\| \end{eqnarray} $ (23)

where $\alpha<1$. Thus $u^{*}=T(u^{*})$. If $u^{*}, v^{*}$ are fixed points, then

$ \| u^{*}-v^{*}\|\, =\, \| T^{i}(u^{*})-T^{i}(v^{*})\|\leq\alpha\| u^{*}-v^{*}\| $

and hence $u^{*}\!=\!v^{*}$. Thus the $u^{*}$ found by successive approximation is a unique fixed point of $T$.


In this section, we will show how this solution using the proposed method is worked to obtained a control law which improves the closed-loop performance of the original control.

A. One Dimensional Nonlinear System

A first-order nonlinear system is described by the state equation

$ \begin{equation}\label{eq4-1} \dot{x}(t)=-x^{3}(t)+u(t) \end{equation} $ (24)

with initial condition $x(0)=x_0$. It is desired to find $u(t), $ $t\in(0, \infty)$, that minimizes the performance measure

$ J=\frac{1}{2}\int^{\infty}_{0}(x^{2}+u^{2})dt. $

Assume a linear control to start with

$ u^{(1)}=ax, \quad a>0 $

it is clear that the system is stable when $a>0$. Now select a feedback initial control

$ u^{(1)}=2x $

then the system (24) under controller $u^{(1)}$ is stable with different initial value as shown in Fig. 1. Substitute $u^{(1)}$ into (7), we have

$ p^{(1)}=\frac{5x^{2}}{4x-2x^{3}}. $
Fig. 1 A continuous stabilizing control $u^{(1)}$ with different initial state.

The above yields

$ \begin{equation} u^{(2)}=-\frac{1}{2}g^{T}p^{(1)}=\frac{5x^{2}}{4x^{3}-8x} \end{equation} $ (25)
$ \begin{eqnarray} p^{(2)} \!&\!=\!&\!-\frac{l+\| u^{(2)}\|^{2}}{\| f+gu^{(2)}\|^{2}}(f+gu^{(2)})^{T} \nonumber \\ \!&\!=\!&\! \frac{x^2(4x^{3}-8x)^2-25x^2}{10x^2(4x^3-8x)-2x^3(4x^3-8x)^2}. \nonumber \end{eqnarray} $

Then the system (24) under controller $u^{(2)}$ is stable with different time domain as shown in Fig. 2.

Fig. 2 A continuous stabilizing control $u^{(2)}$ with different time.

To continue the iterative algorithm, we would repeat the preceding steps, using this revised value. If we select $x=2$, then we can obtain the value $p^{(i)}~(i=1, 2, \ldots)$, for example, $p^{(1)}(2)=-8.000, p^{(2)}(2)=-2.500, p^{(3)}(2)=-0.4120, p^{(4)}(2)=-0.2594$, $p^{(5)}(2)=-0.2552$, $p^{(6)}(2)=-0.2550$. It is obvious that

$ ||p^{(1)}||\geq ||p^{(2)}||\geq ||p^{(3)}||\geq ||p^{(4)}||\geq ||p^{(5)}||\geq ||p^{(6)}||. $

Eventually the iterative procedure should converge to the optimal control history, $u^{*}$. The preceding example indicates the steps involved in carrying out one iteration of Algorithm 1. Let us use this algorithm to determine the optimal trajectory and control for a continuous stirred-tank chemical reactor.

B. Two Dimensional Nonlinear System

The state equations for a continuous stirred-tank chemical reactor are given below from [1]. The flow of a coolant through a coil inserted in the reactor is to control the first-order, irreversible exothermic reaction taking place in the reactor. $x_{1}=T(t)$ is the deviation from the steady-state temperature and $x_{2}(t)=C(t)$ is the deviation from the steady-state concentration. $u(t)$ is the normalized control variable, representing the effect of coolant flow on the chemical reaction. The state equations are

$ \left\{ \begin{align} & {{{\dot{x}}}_{1}}(t)=-2[{{x}_{1}}(t)+0.25]+[{{x}_{2}}(t)+0.5]{{e}^{\frac{25{{x}_{1}}(t)}{{{x}_{1}}(t)+2}}} \\ & \ \ \ \ \ \ \ \ \ \ \ \ -[{{x}_{1}}(t)+0.25]u(t) \\ & {{{\dot{x}}}_{2}}(t)=0.5-{{x}_{2}}(t)-[{{x}_{2}}(t)+0.5]{{e}^{\frac{25{{x}_{1}}(t)}{{{x}_{1}}(t)+2}}} \\ \end{align} \right. $ (26)

with the boundary conditions $x_{0}=[0.05, 0]^{T}$. The performance measure to be minimized is

$ J=\int^{0.78}_{0}[x^{2}_{1}(t)+x^{2}_{2}(t)+Ru^{2}(t)]dt $

indicating that the desired objective is to maintain the temperature and concentration close to their steady-state value without expending large amounts of control effort. $R$ is a weighting factor that we shall select arbitrarily as 0.1. To ensure that a monotonically decreasing sequence of performance indices was generated, each trial control was required to provide a smaller performance measure than the preceding control. This was accomplished by re-generating any trial control that increased the performance measure.

With $t\in [0, 0.78]$, and an initial value equal to 1.0, the value of the performance measure as a function of the number of iterations is as shown in Fig. 3, (b). The results compared with Beard's method are shown in Fig. 3, (c). In this figure, $J$ is regarded as the performance obtained by Beard's method in [16]. It is seen from Fig. 3, (b) and Fig. 3, (c) that the optimal feedback control obtained by our method is better than Beard's method. However, the approximate optimal control under our approach can be decreased by strengthening the stopping criteria even if it costs further computations.

Fig. 3 (a) Optimal control and trajectory; (b) Performance measure reduction by Algorithm 1; (c) Performance measure reduction by Beard in [16].

To illustrate the effects of different initial parameters for the control history, different additional solutions were obtained. The results of these computer runs are summarized in Table Ⅰ.

Table Ⅰ
Summary of Algorithm 1 With Different Parameters

The value of the performance measure $J$ as a function of the number of iterations with different initial control ($u^{(1)}=1.0, 0$) and different stopping criterion ($\epsilon = 10^{-6}, 10^{-7}$) is as shown in Table Ⅰ. To ensure that a monotonically decreasing sequence of performance indices was generated, each initial control was required to provide a smaller stopping criterion than the preceding initial control. This was accomplished by having $\epsilon$ and re-generating any trial control that increased the performance measure. It could be seen from Table Ⅰ when the initial control is equal to zero and the stopping criterion is $10^{-6}$, the steps of iteration reduced significantly, however, the performance measure yielded only very slight reduction. This type of progress is typical of the iterative method.

Remark 1: To conclude our discussion of Algorithm 1, let us summarize the important features of the algorithm. First of all, from the nominal control history, $u^{{(1)}}$, must be selected to begin the numerical procedure. In selecting the nominal control we use whatever physical insight we have about the problem. Secondly, the current control $u^{{(i)}}$ and the corresponding state trajectory $x^{{(i)}}$ are stored. If storage must be conserved, the state value needed to determine $p^{{(i)}}$ can be obtained by reintegrating the state equations with the costate equations. If this is done $x^{{(i)}}$ does not need to be stored; however, the computation time will increase. Generating the required state values in this manner may take the results of the backward integration more accurate, because the piecewise-constant approximation for $x^{{(i)}}$ need not be used. Finally, the proposed method is generally characterized by ease of starting the initial guess for the control is not usually crucial. On the other hand, the method has a tendency to converge slowly as a minimum is approach.

C. Robot Arm Problem

The robot arm problem is taken from [1]. In the formulation the arm of the robot is a rigid bar of length $L$ that protrudes a distance $\rho$ from the origin to the gripping end and sticks out a distance $L-\rho$ in the opposite direction. If the pivot point of the arm is the origin of a spherical coordinate system, then the problem can be phrased in terms of the length $\rho$ of the arm from the pivot, the horizontal and vertical angles $(\theta, \phi)$ from the horizontal plane, the controls $(u_{\rho}, u_{\theta}, u_{\phi})$, and the final time $t_f$. Bounds on the length and angles are

$ 0\leq\rho(t)\leq L, \ \ |\theta(t)|\leq\pi, \ \ 0\leq\phi(t)\leq\pi $

and for the controls

$ |u_{\rho}|\leq 1, \ \ |u_{\theta}|\leq1, \ \ |u_{\phi}|\leq1. $

The equations of motion for the robot arm are

$ \begin{equation} \label{eq4-3} L \ddot{\rho}=u_{\rho}, \ \ I_{\theta}\ddot{\theta}=u_{\theta}, \ \ I_{\phi}\ddot{\phi}=u_{\phi} \end{equation} $ (27)

where $I$ is the moment of inertia, defined by

$ I_{\theta}=\frac{(L-\rho)^{3}+\rho^3}{3}sin^2\phi, \ \ I_{\phi}=\frac{(L-\rho)^{3}+\rho^3}{3}. $

The boundary conditions are

$ L=5, \ \ \rho(0)=\rho(t_f)=\frac{9}{2}, \ \ \theta(0)=0 $
$ \theta(t_f)=\frac{2\pi}{3}, \ \ \rho(0)=\rho(t_f)=\frac{\pi}{4} $
$ \dot{\rho}(0)=\dot{\theta}(0)=\dot{\phi}(0)=\dot{t_f}=\dot{\theta}(t_f)=\dot{\phi}(t_f)=0. $

This model ignores the fact that the spherical coordinate reference frame is a non-inertial frame and should have terms for Coriolis and centrifugal forces. Let $ x_1=\rho, x_2=\dot{\rho}, $ $x_3=\theta, x_4=\dot{\theta}, x_5=\phi, x_6=\dot{\phi}, $ then the optimal control of this robot arm problem is stated as follows. Find the controls $(u_{\rho}, u_{\theta}, u_{\phi})$ such that they minimize the cost functional

$ J=t_f $

subject to the dynamic constraints

$ \begin{align*} &\dot{x_1}=x_2, \ \ \dot{x_2}=\frac{u_{\rho}}{L} \\ &\dot{x_3}=x_4, \ \ \dot{x_4}=\frac{u_{\theta}}{I_{\theta}} \\ &\dot{x_5}=x_6, \ \ \dot{x_6}=\frac{u_{\phi}}{I_{\phi}}. \end{align*} $

The control inequality constraints and the boundary conditions are shown in the above statement.

Fig. 4 shows the variables $x_1, \ldots, x_6$ for the robot arm as a function of time. The control variables $u_{\rho}, u_{\theta}, u_{\phi}$ for the robot arm as a function of time are also shown in Fig. 5. Note that the functions $x_1, x_3, x_5$ for the robot arm are continuously differentiable, but since the second derivatives are directly proportional to the controls, the second derivatives are piecewise continuous.

Fig. 4 Variables $x_1, \ldots, x_6$ for the robot arm problem.
Fig. 5 Control variables $u_{\rho}, u_{\theta}, u_{\phi}$ for the robot arm problem.

In this paper, we proposed a new iterative numerical technique to solve the GHJB equation effectively. There is a need for numerical methods which approximate solutions to the special types of equations which arise in nonlinear optimal control. We also showed that the resulting controls are in feedback form and stabilize the closed-loop system. The procedure is proved to converge to the optimal solution to GHJB equation with respect to the iteration variable.

[1] D. E. Kirk, Optimal Control Theory: An Introduction. Mineola, USA: Courier Corporation, 2012.
[2] F. L. Lewis and V. L. Syrmos, Optimal Control. New York, USA: Wiley, 1995.
[3] R. Bellman, Dynamic Programming. New Jersey, USA: Princeton University Press, 1957.
[4] D. Kleinman, "On an iterative technique for riccati equation computations, " IEEE Trans. Autom. Contr., vol. 13, no. 1, pp. 114-115, Feb. 1968.
[5] F. Y. Wang, H. G. Zhang, and D. R. Liu, "Adaptive dynamic programming: An introduction, " IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
[6] F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, "Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers, " IEEE Control Syst., vol. 32, no. 6, pp. 76-105, Dec. 2012.
[7] W. Zhang, W. X. Liu, X. Wang, L. M. Liu, and F. Ferrese, "Online optimal generation control based on constrained distributed gradient algorithm, " IEEE Trans. Power Syst., vol. 30, no. 1, pp. 35-45, Jan. 2015.
[8] N. Sakamoto and A. J. van der Schaft, "Analytical approximation methods for the stabilizing solution of the Hamilton-Jacobi equation, " IEEE Trans. Autom. Control, vol. 53, no. 10, pp. 2335-2350, Nov. 2008.
[9] T. Bian, Y. Jiang, and Z. P. Jiang, "Adaptive dynamic programming and optimal control of nonlinear nonaffine systems, " Automatica, vol. 50, no. 10, pp. 2624-2632, Oct. 2014.
[10] C. O. Aguilar and A. J. Krener, "Numerical solutions to the Bellman equation of optimal control, " J. Optim. Theory Appl., vol. 160, no. 2, pp. 527-552, Feb. 2014.
[11] S. Cacace, E. Cristiani, M. Falcone, and A. Picarelli, "A patchy dynamic programming scheme for a class of Hamilton-Jacobi-Bellman equations, " SIAM J. Sci. Comput., vol. 34, no. 5, pp. A2625-A2649, Oct. 2012.
[12] N. Govindarajan, C. C. de Visser, and K. Krishnakumar, "A sparse collocation method for solving time-dependent HJB equations using multivariate B-splines, " Automatica, vol. 50, no. 9, pp. 2234-2244, Sep. 2014.
[13] J. Markman and I. N. Katz, "An iterative algorithm for solving Hamilton-Jacobi type equations, " SIAM J. Sci. Comput., vol. 22, no. 1, pp. 312-329, Jul. 2000.
[14] I. Smears and E. Süli, "Discontinuous Galerkin finite element approximation of Hamilton-Jacobi-Bellman equations with Cordes coefficients, "SIAM J. Numer. Anal., vol. 52, no. 2, pp. 993-1016, Apr. 2014.
[15] R. W. Beard, G. N. Saridis, and J. T. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation, " J. Optim. Theory Appl., vol. 96, no. 3, pp. 589-626, Mar. 1998.
[16] R. W. Beard, G. N. Saridis, and J. T. Wen, "Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation, " Automatica, vol. 33, no. 12, pp. 2159-2177, Dec. 1997.
[17] G. N. Saridis and C. S. G. Lee, "An approximation theory of optimal control for trainable manipulators, " IEEE Trans. Syst. Man Cybern. vol. 9, no. 3, pp. 152-159, Mar. 1979.
[18] Q. Gong, W. Kang, and I. M. Ross, "A pseudospectral method for the optimal control of constrained feedback linearizable systems, " IEEE Trans. Autom. Control, vol. 51, no. 7, pp. 1115-1129, Jul. 2006.