A New Approach to Reinforcement Learning for Sequential Robotic Tasks using a Chained Options Model and Subtask-Focused Rewards

Daga, Somesh

dc.contributor.author	Daga, Somesh
dc.date.accessioned	2021-09-09 13:37:27 (GMT)
dc.date.available	2021-09-09 13:37:27 (GMT)
dc.date.issued	2021-09-09
dc.date.submitted	2021-08-10
dc.identifier.uri	http://hdl.handle.net/10012/17361
dc.description.abstract	Reinforcement Learning for Robotics is a trending area of research with tremendous potential for widescale industry adoption. To its detriment, large amounts of environmental interactions are typically required by robotic agents to discover good behaviours. In response, Hierarchical Reinforcement Learning methods are gaining traction and have demonstrated improved learning efficiencies through employing abstractions in the learning process. Additionally, implications on safety arising from black-box agents operating in physical environments, has generated interest in exploring explainable forms of learning. In this thesis, we leverage a popular form of Hierarchical Reinforcement Learning, known as the Options Framework, to address learning for tasks that may be expressed as a sequential composition of subtasks. This form of task decomposition is prevalent in classical approaches to many robotic planning and control applications, and offers an avenue to segment tasks into sets of distinct and interpretable behaviors. As our primary contribution, we propose a novel, potential-based reward formulation and decomposition, that is conducive to subtask behavior specialization and incentivizes a learning agent to solve the composite task, under the Options Framework. As a result, we offer increased visibility into the actions of the agent at the subtask level. An off-policy Maximum Entropy Deep Reinforcement Learning algorithm is developed to simultaneously discover relevant policies across subtasks and determine when to transition between subtasks in an end-to-end learning scheme. Furthermore, we propose a chained option execution model to leverage expert knowledge of the task and promote stability in the learning of subtask transitions. Finally, segmenting agent behaviors at the subtask level allows for the injection of expert knowledge into the action spaces of individual subtasks, which we exploit through the use of default actions. We demonstrate the results of our work on high-dimensional, simulated 2D and 3D manipulator environments, for the tasks of pick-and-place and opening a door.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	reinforcement learning	en
dc.subject	robotics	en
dc.subject	manipulation	en
dc.title	A New Approach to Reinforcement Learning for Sequential Robotic Tasks using a Chained Options Model and Subtask-Focused Rewards	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	Mechanical and Mechatronics Engineering	en
uws-etd.degree.discipline	Mechanical Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Applied Science	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Jeon, Soo
uws.contributor.advisor	Melek, William
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Daga_Somesh.pdf
Size:: 3.117Mb
Format:: PDF
Description:: Main Article

View/ Open

This item appears in the following Collection(s)

Show simple item record