Journal Article10.48550/arXiv.2302.00671
Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing
TL;DR: In this article , the authors propose a simple MTRL framework for identifying shareable behaviors over tasks and incorporating them to guide exploration, and empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation tasks and is even complementary to parameter sharing.
read more
Abstract: The ability to leverage shared behaviors between tasks is critical for sample-efficient multi-task reinforcement learning (MTRL). While prior methods have primarily explored parameter and data sharing, direct behavior-sharing has been limited to task families requiring similar behaviors. Our goal is to extend the efficacy of behavior-sharing to more general task families that could require a mix of shareable and conflicting behaviors. Our key insight is an agent's behavior across tasks can be used for mutually beneficial exploration. To this end, we propose a simple MTRL framework for identifying shareable behaviors over tasks and incorporating them to guide exploration. We empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation MTRL tasks and is even complementary to parameter sharing. Result videos are available at https://sites.google.com/view/qmp-mtrl.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 4: 2D Point Reaching. We visualize the training trajectories of π with different sets of task policies (fixed but stochastic) and color each step with the policy that proposed it. (a) The single-task SAC policy cannot reach the goal. (b) With 3 diverse policies (↑ → ↙), QMP often selects other policies, showing the suboptimality gap from Q in Eq. 1. (c) When a highly relevant ↗ policy is added, QMP often selects ↗ as it is likely to best optimize the learned Q-function. 
Figure 5: QMP improves performance using other policies, increasingly so when they are task-relevant (5 seeds). 
Figure 15: Average contribution of each Policy j (col j) in each Task i’s (row i) data collection on Reacher Multistage (diagonal zeroed for contrast). 
Figure 1: We propose a sample-efficient MTRL framework that selectively shares behaviors by acting with other task policies for data collection. For example, Drawer Open and Drawer Close can share behaviors for grasping the drawer handle, while Drawer Open and Door Close share behaviors for approaching the tabletop. 
Figure 14: Mixture probabilities per task of other policies over the course of training for Multistage Reacher. The conflicting task Policy 4, which requires staying stationary, is highlighted in red. 
Table 2: Temporally Extended Behavior Sharing
Citations
On the Value of Myopic Behavior in Policy Reuse
TL;DR: The authors proposed a selective myopic behavior control (SMEC) framework, which adaptively aggregates the sharable short-term behaviors of prior policies and the long-term policies of the task policy, leading to coordinated decisions.
Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency
Francisco Roldán Sánchez,Qiang Wang,David Cordova Bulens,Kevin McGuinness,Stephen J. Redmond,Noel E. O'Connor +5 more
TL;DR: This paper proposes a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks, and demonstrates the agents can learn a successful policy faster when using the proposed method.
EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data
Jesse Zhang,Min-Suk Heo,Zuxin Liu,Erdem Bıyık,Joseph J. Lim,Yao Liu,Rasool Fakoor +6 more
- 25 Jun 2024
TL;DR: EXTRACT efficiently extracts transferrable robot skills from offline data, enabling faster learning of new tasks with major gains in sample efficiency and performance over prior skill-based RL.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Posted Content
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.
25.9K
Reinforcement learning: a survey
TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.
•Posted Content
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.
6.7K
MuJoCo: A physics engine for model-based control
Emanuel Todorov,Tom Erez,Yuval Tassa +2 more
- 24 Dec 2012
TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.