Learning to Engage: An Application of Deep Reinforcement Learning in Living Architecture Systems

Meng, Lingheng

UWSpace will be migrating to a new version of its software from July 29th to August 1st. UWSpace will be offline for all UW community members during this time.

Show simple item record

dc.contributor.author	Meng, Lingheng
dc.date.accessioned	2023-05-30 15:22:39 (GMT)
dc.date.available	2023-05-30 15:22:39 (GMT)
dc.date.issued	2023-05-30
dc.date.submitted	2023-05-29
dc.identifier.uri	http://hdl.handle.net/10012/19511
dc.description.abstract	Physical agents that can autonomously generate engaging, life-like behavior will lead to more responsive and interesting robots and other autonomous systems. Although many advances have been made for one-to-one interactions in well controlled settings, future physical agents should be capable of interacting with humans in natural settings, including group interaction. In order to generate engaging behaviors, the autonomous system must first be able to estimate its human partners' engagement level, then take actions to maximize the estimated engagement. In this thesis, we take Living Architecture Systems (LAS), architecture scale interactive systems capable of group interaction through distributed embedded sensors and actuators, as a testbed and apply Deep Reinforcement Learning (DRL) by treating the estimate of engagement as a reward signal in order to automatically generate engaging behavior. However, applying DRL to LAS is difficult, because of DRL’s low data efficiency, overestimation problem, and issues with state observability, especially considering the large observation and action space of LAS. We first propose an approach for estimating engagement during group interaction by simultaneously taking into account active and passive interaction, and use the measure as the reward signal within a reinforcement learning framework to learn engaging interactive behaviors. The proposed approach is implemented in a LAS in a museum setting. We compare the performance of the learning system to that of a baseline design using pre-scripted interactive behavior. Analysis based on sensory data and survey data shows that adaptable behaviors within an expert-designed action space can achieve higher engagement and likeability. However, this initial approach relies on a manually defined reward and assumes a known, concise definition of the state and action space to address issues of slow learning, sample efficiency and state/action specification of DRL. To relax these restrictive assumptions, we first analyze the effect of multi-step methods on alleviating the overestimation problem in DRL, and building on top of Deep Deterministic Policy Gradient (DDPG), propose Multi-step DDPG (MDDPG) and Mixed Multi-step DDPG (MMDDPG). Empirically, we show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than vanilla DDPG, which consequently results in better final performance and learning speed. Then, to handle Partially Observable Markov Decision Processes (POMDPs), we propose Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) by introducing a memory component to TD3, and compare its performance with other DRL algorithms in both MDPs and POMDPs. Our results demonstrate the significant advantages of the memory component in addressing POMDPs, including the ability to handle missing and noisy observation data. After that, we investigate partial observability as a potential failure source of applying DRL to robot control tasks, which can occur when researchers are not confident whether the observation space fully represents the underlying state. We compare the performance of TD3, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) under various partial observability conditions, and find that TD3 and SAC become easily stuck in local optima and underperform PPO. We propose multi-step versions of the vanilla TD3 and SAC to improve their robustness to partial observability. Based on our study with manually designed reward functions, which is the estimate of engagement, and the fundamental research on DRL, we further reduce the reliance on designers' field knowledge, and propose to learn a reward function from human preferences on engaging behavior by taking advantage of preference learning algorithms. Our simulation results show that the reward function induced from human preference is able to lead to a policy that generates engaging behavior.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	human robot interaction	en
dc.subject	deep reinforcement learning	en
dc.subject	preference learning	en
dc.subject	living architecture systems	en
dc.subject	interactive systems	en
dc.subject	machine learning	en
dc.title	Learning to Engage: An Application of Deep Reinforcement Learning in Living Architecture Systems	en
dc.type	Doctoral Thesis	en
dc.pending	false
uws-etd.degree.department	Electrical and Computer Engineering	en
uws-etd.degree.discipline	Electrical and Computer Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Doctor of Philosophy	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Dana, Kulić
uws.contributor.advisor	Rob, Gorbet
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en