Use of Slip Prediction for Learning Grasp-Stability Policies in Robotic-Grasp Simulation

Stracovsky, Lukas

dc.contributor.author	Stracovsky, Lukas
dc.date.accessioned	2023-07-04 13:04:22 (GMT)
dc.date.available	2023-07-04 13:04:22 (GMT)
dc.date.issued	2023-07-04
dc.date.submitted	2023-06-25
dc.identifier.uri	http://hdl.handle.net/10012/19597
dc.description.abstract	The purpose of prosthetic hands is to restore a portion of dexterity lost through upper limb amputation. However, a key capability of human grasping that is missing from most currently available prosthetic hands is the ability to adapt grasp forces in response to slip or disturbances without visual information. Current prosthetic hands do not have the integrated tactile sensors or control policies to support adaptive grasp stabilization or manipulation. Research on slip detection and classification has been providing a pathway towards integrating tactile sensors on robotic and prosthetic hands; however, current literature focuses on specific sensors and simple graspers. Policies that use slip prediction to adapt grasp forces are still largely unexplored. Rigid-body simulations have recently emerged as a useful tool for training control policies due to improvements in machine learning techniques. Simulations allow large amounts of interactive data to be generated for training. However, since simulations only approximate reality, policies trained in simulation may not be transferable to physical systems. Several grasp policies with impressive dexterity have been trained in simulation and transferred successfully to physical systems. However, these grasp policies used visual data as policy inputs instead of tactile data. This research investigates if rigid-body simulations can use slip prediction as the primary input for training grasp stabilization policies. Since current slip detection and prediction literature is based on specific tactile sensors and grasper setups, testing slip-reactive grasp policies is difficult, especially with an anthropomorphic hand. As an alternative to implementing a system-specific policy, real human grasp poses and motion-trajectories were used to test if the trained policy could replicate known human grasp stability. To acquire the human grasp data, grasp and motion trajectories from a human motion-capture dataset were adapted into a simulation. Since motion-capture only includes grasp and object pose data, grasp forces had to be inferred through a combination of analytical and iterative methods. Simulation contacts are also just approximate models; therefore, slip in the simulation was characterized for detection and prediction. The stability of the converted grasps was tested by simulating the grasp manipulation episodes with no control policy. Viable grasps were expected to maintain stability until the manipulation trajectory caused grasp degradation or loss. The initial grasps maintained stability for an average of 27.7% of the grasp episode durations, though with a wide standard deviation of 35%. The large standard deviation is due to episodes with high hand acceleration trajectories, as well as grasp objects with varying grasping difficulty. Policy training using the imported grasps and trajectories was performed using reinforcement learning, specifically proximal-policy optimization. Policies were trained with and without slip prediction inputs, using different reward functions: a reward proportional to the duration of grasp stability, and a reward that also added a grasp-force magnitude penalty. A multi-layer perceptron was used as the policy function approximator. The policies without slip-prediction inputs did not converge, while the policy with slip inputs and the grasp-force penalty-reward function converged on a poorly performing policy. On average, episodes tested with the policy that used a grasp-force-penalty had a 0.11 s reduction in grasp stability duration compared to the initial grasp duration results. However, episodes that did have improved stability under the learned policy improved on average by 0.38 s, significantly higher than the average stability loss. Moreover, the change in stability duration under the trained policy negatively correlated with the initial stability duration (Pearson -0.69, p-value 9.79e-11). These results suggest that slip predictions contribute to learned grasp policies, and that reward shaping is critical to the grasp-stability task. Ultimately, the trained policies did not perform better than the baseline no-policy grasp stability, suggesting that the slip predictions were not sufficient to train reasonable grasp policies in simulation.	en
dc.language.iso	en	en
dc.publisher	University of Waterloo	en
dc.subject	Mujoco	en
dc.subject	Reinforcement Learning	en
dc.subject	Grasp stabilization	en
dc.title	Use of Slip Prediction for Learning Grasp-Stability Policies in Robotic-Grasp Simulation	en
dc.type	Master Thesis	en
dc.pending	false
uws-etd.degree.department	Mechanical and Mechatronics Engineering	en
uws-etd.degree.discipline	Mechanical Engineering	en
uws-etd.degree.grantor	University of Waterloo	en
uws-etd.degree	Master of Applied Science	en
uws-etd.embargo.terms	0	en
uws.contributor.advisor	Kofman, Jonathan
uws.contributor.advisor	Soo, Jeon
uws.contributor.affiliation1	Faculty of Engineering	en
uws.published.city	Waterloo	en
uws.published.country	Canada	en
uws.published.province	Ontario	en
uws.typeOfResource	Text	en
uws.peerReviewStatus	Unreviewed	en
uws.scholarLevel	Graduate	en

Files in this item

Name:: Stracovsky_Lukas.pdf
Size:: 2.200Mb
Format:: PDF
Description:: Main article

View/ Open

This item appears in the following Collection(s)

Show simple item record