Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing

doi:10.48550/arXiv.2302.00671

Journal Article10.48550/arXiv.2302.00671

Efficient Multi-Task Reinforcement Learning via Selective Behavior Sharing

Grace Zhang, +4 more

- 01 Feb 2023

- arXiv.org

- Vol. abs/2302.00671

3

TL;DR: In this article , the authors propose a simple MTRL framework for identifying shareable behaviors over tasks and incorporating them to guide exploration, and empirically demonstrate how behavior sharing improves sample efficiency and final performance on manipulation and navigation tasks and is even complementary to parameter sharing.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 4: 2D Point Reaching. We visualize the training trajectories of π with different sets of task policies (fixed but stochastic) and color each step with the policy that proposed it. (a) The single-task SAC policy cannot reach the goal. (b) With 3 diverse policies (↑ → ↙), QMP often selects other policies, showing the suboptimality gap from Q in Eq. 1. (c) When a highly relevant ↗ policy is added, QMP often selects ↗ as it is likely to best optimize the learned Q-function.

Figure 5: QMP improves performance using other policies, increasingly so when they are task-relevant (5 seeds).

Figure 15: Average contribution of each Policy j (col j) in each Task i’s (row i) data collection on Reacher Multistage (diagonal zeroed for contrast).

Figure 1: We propose a sample-efficient MTRL framework that selectively shares behaviors by acting with other task policies for data collection. For example, Drawer Open and Drawer Close can share behaviors for grasping the drawer handle, while Drawer Open and Door Close share behaviors for approaching the tabletop.

Figure 14: Mixture probabilities per task of other policies over the course of training for Multistage Reacher. The conflicting task Policy 4, which requires staying stationary, is highlighted in red.

Table 2: Temporally Extended Behavior Sharing

Citations

Journal Article•10.48550/arXiv.2305.17623

On the Value of Myopic Behavior in Policy Reuse

Kang Xu, +7 more

- 28 May 2023

- arXiv.org

TL;DR: The authors proposed a selective myopic behavior control (SMEC) framework, which adaptively aggregates the sharable short-term behaviors of prior policies and the long-term policies of the task policy, leading to coordinated decisions.

...read moreread less

Journal Article•10.48550/arxiv.2310.01827

Learning and reusing primitive behaviours to improve Hindsight Experience Replay sample efficiency

Francisco Roldán Sánchez, +5 more

- 03 Oct 2023

- arXiv.org

TL;DR: This paper proposes a method that uses primitive behaviours that have been previously learned to solve simple tasks in order to guide the agent toward more rewarding actions during exploration while learning other more complex tasks, and demonstrates the agents can learn a successful policy faster when using the proposed method.

...read moreread less

Preprint•10.48550/arxiv.2406.17768

EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data

Jesse Zhang, +6 more

- 25 Jun 2024

TL;DR: EXTRACT efficiently extracts transferrable robot skills from offline data, enabling faster learning of new tasks with major gains in sample efficiency and performance over prior skill-based RL.

...read moreread less

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Posted Content

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 03 Dec 2019

- arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

25.9K

•Journal Article•10.1613/JAIR.301

Reinforcement learning: a survey

Leslie Pack Kaelbling, +2 more

- 01 Jan 1996

- Journal of Artificial Intelligence Resea...

TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.

...read moreread less

9K

•Posted Content

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, +3 more

- 04 Jan 2018

- arXiv: Learning

TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.

...read moreread less

6.7K

Proceedings Article•10.1109/IROS.2012.6386109

MuJoCo: A physics engine for model-based control

Emanuel Todorov, +2 more

- 24 Dec 2012

TL;DR: A new physics engine tailored to model-based control, based on the modern velocity-stepping approach which avoids the difficulties with spring-dampers, which can compute both forward and inverse dynamics.

...read moreread less

6.4K

...

Expand