Online Learning Algorithms For Differential Dynamic Games And Optimal Control

ResearchCommons/Manakin Repository

Online Learning Algorithms For Differential Dynamic Games And Optimal Control

Show full item record

Title: Online Learning Algorithms For Differential Dynamic Games And Optimal Control
Author: Vamvoudakis, Kyriakos G.
Abstract: Optimal control deals with the problem of finding a control law for a given system that a certain optimality criterion is achieved. It can be derived using Pontryagin's maximum principle (a necessary condition), or by solving the Hamilton-Jacobi-Bellman equation (a sufficient condition). Major drawback of optimal control is that it is offline. Adaptive control involves modifying the control law used by a controller to cope with the facts that the system is unknown or uncertain. Adaptive controllers are not optimal. Adaptive optimal controllers have been proposed by adding optimality criteria to an adaptive controller, or adding adaptive characteristics to an optimal controller. In this work, online adaptive learning algorithms are developed for optimal control and differential dynamic games by using measurements along the trajectory or input/output data. These algorithms are based on actor/critic schemes and involve simultaneous tuning of the actor/critic neural networks and provide online solutions to complex Hamilton-Jacobi equations, along with convergence and Lyapunov stability proofs. The research begins with the development of an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost for nonlinear systems with known dynamics. That is, the algorithm learns online in real-time the solution to the optimal control design Hamilton-Jacobi (HJ) equation. This is called `synchronous' policy iteration. Then it became interesting to develop an online learning algorithm to solve the continuous-time two-player zero-sum game with infinite horizon cost for nonlinear systems. The algorithm learns online in real-time the solution to the game design Hamilton-Jacobi-Isaacs equation. This algorithm is called online gaming algorithm `synchronous' zero-sum game policy iteration. One of the major outcomes of this work is the online learning algorithm to solve the continuous time multi player non-zero sum games with infinite horizon for linear and nonlinear systems. The adaptive algorithm learns online the solution of coupled Riccati and coupled Hamilton-Jacobi equations for linear and nonlinear systems respectively. The optimal-adaptive algorithm is implemented as a separate actor/critic parametric network approximator structure for every player, and involves simultaneous continuous-time adaptation of the actor/critic networks. The next result shows how to implement Approximate Dynamic Programming methods using only measured input/output data from the systems. Policy and value iteration algorithms have been developed that converge to an optimal controller that requires only output feedback. The notion of graphical games is developed for dynamical systems, where the dynamics and performance indices for each node depend only on local neighbor information. A cooperative policy iteration algorithm, is given for graphical games, that converges to the best response when the neighbors of each agent do not update their policies and to the cooperative Nash equilibrium when all agents update their policies simultaneously. Finally, a synchronous policy iteration algorithm based on integral reinforcement learning is given. This algorithm does not need the drift dynamics.
Date: 2011-07-14

Files in this item

Files Size Format View
Vamvoudakis_uta_2502D_11063.pdf 3.693Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record


My Account


About Us