Kickstarting Deep Reinforcement Learning

Open AccessPosted Content

Kickstarting Deep Reinforcement Learning

- 10 Mar 2018

120

TL;DR: It is shown that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Improved Knowledge Distillation via Teacher Assistant

Seyed Iman Mirzadeh, +5 more

- 09 Feb 2019

- arXiv: Learning

TL;DR: Multi-step knowledge distillation is introduced, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher and study the effect of teacher assistant size and extend the framework to multi-step distillation.

...read moreread less

724

•Posted Content

Transfer Learning in Deep Reinforcement Learning: A Survey

Zhuangdi Zhu, +2 more

- 16 Sep 2020

- arXiv: Learning

TL;DR: This survey surveys the field of transfer learning in the problem setting of Reinforcement Learning, providing a systematic categorization of its state-of-the-art techniques.

...read moreread less

407

Proceedings Article

Red Teaming Language Models with Language Models

Ethan Perez, +8 more

- 07 Feb 2022

TL;DR: This work automatically finds cases where a target LM behaves in a harmful way, by generating test cases (“red teaming”) using another LM, and evaluates the target LM’s replies to generated test questions using a classifier trained to detect offensive content.

...read moreread less

379

•Journal Article•10.1609/AAAI.V33I01.33013796

Multi-task Deep Reinforcement Learning with PopArt

Matteo Hessel, +5 more

- 17 Jul 2019

TL;DR: This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.

...read moreread less

366

Journal Article•10.48550/arXiv.2203.11147

Teaching language models to support answers with verified quotes

Jacob Menick, +10 more

- 21 Mar 2022

- arXiv.org

TL;DR: This work uses reinforcement learning from human preferences to train “open-book” QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness.

...read moreread less

172

...

Expand

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Proceedings Article•10.1109/CVPR.2014.81

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

- 23 Jun 2014

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

33.7K

•Posted Content

Distilling the Knowledge in a Neural Network

Geoffrey E. Hinton, +2 more

- 09 Mar 2015

- arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

21.2K

•Posted Content

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, +3 more

- 11 Nov 2013

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

...read moreread less

13.1K

•Proceedings Article

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, +7 more

- 19 Jun 2016

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

9.2K