Hierarchical Reinforcement Learning Using Automatic Task Decomposition And Exploration Shaping

ResearchCommons/Manakin Repository

Hierarchical Reinforcement Learning Using Automatic Task Decomposition And Exploration Shaping

Show simple item record Djurdjevic, Predrag en_US 2008-08-08T02:31:15Z 2008-08-08T02:31:15Z 2008-08-08T02:31:15Z April 2008 en_US
dc.identifier.other DISS-2071 en_US
dc.description.abstract Reinforcement learning agents situated in real world environments have to be able to address a number of challenges in order to succeed at accomplishing a wide range of tasks over their lifetime. Among these, such systems have to be able to extract control knowledge from already learned tasks and apply them to subsequent ones in order to allow the agent to accomplish the new task faster and to accelerate the learning of an optimal policy. To address skill reuse and skill transfer, a number of approaches using hierarchical state and action spaces have been introduced recently which build on the idea of transferring the previously learned policies and representations to model and control the new task. However, while such transfer of skills can significantly improve learning times, it also poses the risk of "behavior proliferation" where the increasing set of available reusable actions makes it incrementally more difficult to determine a strategy for a new task. To address this issue, it is important for the agent to have the capability to analyze new tasks and to have a means of predicting the utility of an action or skill in a new context prior to learning a policy for the task. The former here implies an ability to decompose the new task into known subtasks while the latter implies the availability of an informed exploration policy used to find the new goal and to more efficiently learn a corresponding policy. This thesis presents a novel approach for learning task decomposition by learning to predict the utility of subgoals and subgoal types in the context of the new task, as well as for exploration shaping by predicting the likelihood with which each available action is useful in the given task context. To achieve this, the approach presented here uses past learning experiences to acquire set of utility functions that encode relevant knowledge about useful subgoals and skills and applies them to shape the search for the optimal policy for the new task. Acceleration is achieved by focusing the search on contextually identifiable subgoals and actions/skills that have been learned to be valuable in the context of optimal policies in the previously encountered worlds. Performance increase is achieved here both in terms of the time required to reach the task's goal the first time and time required to learn an optimal policy, which is demonstrated in the context of navigation and manipulation tasks in a grid world domain. en_US
dc.description.sponsorship Huber, Manfred en_US
dc.language.iso EN en_US
dc.publisher Computer Science & Engineering en_US
dc.title Hierarchical Reinforcement Learning Using Automatic Task Decomposition And Exploration Shaping en_US
dc.type M.S. en_US
dc.contributor.committeeChair Huber, Manfred en_US Computer Science & Engineering en_US Computer Science & Engineering en_US University of Texas at Arlington en_US masters en_US M.S. en_US
dc.identifier.externalLinkDescription Link to Research Profiles

Files in this item

Files Size Format View
umi-uta-2071.pdf 1.123Mb PDF View/Open
1.123Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record


My Account


About Us