Asking for Help with a Cost in Reinforcement Learning
Abstract
Reinforcement learning (RL) is a powerful tool for developing
intelligent agents, and the use of neural networks makes RL techniques more
scalable to challenging real-world applications, from task-oriented dialogue
systems to autonomous driving. However, one of the major bottlenecks to the
adoption of RL is efficiency, as it often takes many time steps to learn an
acceptable policy. To address this problem, we investigate the idea of
allowing the agent to ask for advice from a teacher. We formalize this
concept in a framework called ask-for-help RL, which entails augmenting a
Markov decision process with a teacher-query action that can be taken at a
fixed cost in any state. In this task, the agent faces a dilemma between
exploration, exploitation, and teacher-querying. To make this trade-off, we
propose an action selection strategy that is rooted in the classical notion
of value-of-information, and suggest a practical implementation that is based
on deep Q-learning. This algorithm, called VOE/Q, can jointly decide between
taking a particular environment action or querying the teacher, and is
sensitive to the query cost. We then perform experiments in two domains: a
maze navigation task and the Atari game Freeway. When the teacher is
excluded, the algorithm shows substantial gains over many other exploration
strategies from the literature. With the teacher included, we again find that
the algorithm outperforms baselines. By taking advantage of the teacher,
higher cumulative reward can be achieved than with standard RL alone.
Together, our results point to a promising approach to both RL and
ask-for-help RL.
Collections
Cite this version of the work
Colin Vandenhof
(2020).
Asking for Help with a Cost in Reinforcement Learning. UWSpace.
http://hdl.handle.net/10012/15872
Other formats
Related items
Showing items related by title, author, creator and subject.
-
Multi-Agent Reinforcement Learning in Large Complex Environments
Ganapathi Subramanian, Sriram (University of Waterloo, 2022-07-15)Multi-agent reinforcement learning (MARL) has seen much success in the past decade. However, these methods are yet to find wide application in large-scale real world problems due to two important reasons. First, MARL ... -
Learning From Almost No Data
Sucholutsky, Ilia (University of Waterloo, 2021-06-15)The tremendous recent growth in the fields of artificial intelligence and machine learning has largely been tied to the availability of big data and massive amounts of compute. The increasingly popular approach of training ... -
Optimal Learning Theory and Approximate Optimal Learning Algorithms
Song, Haobei (University of Waterloo, 2019-09-12)The exploration/exploitation dilemma is a fundamental but often computationally intractable problem in reinforcement learning. The dilemma also impacts data efficiency which can be pivotal when the interactions between the ...