Show simple item record

dc.contributor.authorMeng, Lingheng
dc.date.accessioned2023-05-30 15:22:39 (GMT)
dc.date.available2023-05-30 15:22:39 (GMT)
dc.date.issued2023-05-30
dc.date.submitted2023-05-29
dc.identifier.urihttp://hdl.handle.net/10012/19511
dc.description.abstractPhysical agents that can autonomously generate engaging, life-like behavior will lead to more responsive and interesting robots and other autonomous systems. Although many advances have been made for one-to-one interactions in well controlled settings, future physical agents should be capable of interacting with humans in natural settings, including group interaction. In order to generate engaging behaviors, the autonomous system must first be able to estimate its human partners' engagement level, then take actions to maximize the estimated engagement. In this thesis, we take Living Architecture Systems (LAS), architecture scale interactive systems capable of group interaction through distributed embedded sensors and actuators, as a testbed and apply Deep Reinforcement Learning (DRL) by treating the estimate of engagement as a reward signal in order to automatically generate engaging behavior. However, applying DRL to LAS is difficult, because of DRL’s low data efficiency, overestimation problem, and issues with state observability, especially considering the large observation and action space of LAS. We first propose an approach for estimating engagement during group interaction by simultaneously taking into account active and passive interaction, and use the measure as the reward signal within a reinforcement learning framework to learn engaging interactive behaviors. The proposed approach is implemented in a LAS in a museum setting. We compare the performance of the learning system to that of a baseline design using pre-scripted interactive behavior. Analysis based on sensory data and survey data shows that adaptable behaviors within an expert-designed action space can achieve higher engagement and likeability. However, this initial approach relies on a manually defined reward and assumes a known, concise definition of the state and action space to address issues of slow learning, sample efficiency and state/action specification of DRL. To relax these restrictive assumptions, we first analyze the effect of multi-step methods on alleviating the overestimation problem in DRL, and building on top of Deep Deterministic Policy Gradient (DDPG), propose Multi-step DDPG (MDDPG) and Mixed Multi-step DDPG (MMDDPG). Empirically, we show that both MDDPG and MMDDPG are significantly less affected by the overestimation problem than vanilla DDPG, which consequently results in better final performance and learning speed. Then, to handle Partially Observable Markov Decision Processes (POMDPs), we propose Long-Short-Term-Memory-based Twin Delayed Deep Deterministic Policy Gradient (LSTM-TD3) by introducing a memory component to TD3, and compare its performance with other DRL algorithms in both MDPs and POMDPs. Our results demonstrate the significant advantages of the memory component in addressing POMDPs, including the ability to handle missing and noisy observation data. After that, we investigate partial observability as a potential failure source of applying DRL to robot control tasks, which can occur when researchers are not confident whether the observation space fully represents the underlying state. We compare the performance of TD3, Soft Actor-Critic (SAC) and Proximal Policy Optimization (PPO) under various partial observability conditions, and find that TD3 and SAC become easily stuck in local optima and underperform PPO. We propose multi-step versions of the vanilla TD3 and SAC to improve their robustness to partial observability. Based on our study with manually designed reward functions, which is the estimate of engagement, and the fundamental research on DRL, we further reduce the reliance on designers' field knowledge, and propose to learn a reward function from human preferences on engaging behavior by taking advantage of preference learning algorithms. Our simulation results show that the reward function induced from human preference is able to lead to a policy that generates engaging behavior.en
dc.language.isoenen
dc.publisherUniversity of Waterlooen
dc.subjecthuman robot interactionen
dc.subjectdeep reinforcement learningen
dc.subjectpreference learningen
dc.subjectliving architecture systemsen
dc.subjectinteractive systemsen
dc.subjectmachine learningen
dc.titleLearning to Engage: An Application of Deep Reinforcement Learning in Living Architecture Systemsen
dc.typeDoctoral Thesisen
dc.pendingfalse
uws-etd.degree.departmentElectrical and Computer Engineeringen
uws-etd.degree.disciplineElectrical and Computer Engineeringen
uws-etd.degree.grantorUniversity of Waterlooen
uws-etd.degreeDoctor of Philosophyen
uws-etd.embargo.terms0en
uws.contributor.advisorDana, Kulić
uws.contributor.advisorRob, Gorbet
uws.contributor.affiliation1Faculty of Engineeringen
uws.published.cityWaterlooen
uws.published.countryCanadaen
uws.published.provinceOntarioen
uws.typeOfResourceTexten
uws.peerReviewStatusUnrevieweden
uws.scholarLevelGraduateen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record


UWSpace

University of Waterloo Library
200 University Avenue West
Waterloo, Ontario, Canada N2L 3G1
519 888 4883

All items in UWSpace are protected by copyright, with all rights reserved.

DSpace software

Service outages