Open AccessPosted Content
Kickstarting Deep Reinforcement Learning
Simon Schmitt,J. J. Hudson,Augustin Zidek,Simon Osindero,Carl Doersch,Wojciech Marian Czarnecki,Joel Z. Leibo,Heinrich Küttler,Andrew Zisserman,Karen Simonyan,S. M. Ali Eslami +10 more
TL;DR: It is shown that, on a challenging and computationally-intensive multi-task benchmark (DMLab-30), kickstarted training improves the data efficiency of new agents, making it significantly easier to iterate on their design.
read more
Abstract: We present a method for using previously-trained 'teacher' agents to
kickstart the training of a new 'student' agent. To this end, we leverage ideas
from policy distillation and population based training. Our method places no
constraints on the architecture of the teacher or student agents, and it
regulates itself to allow the students to surpass their teachers in
performance. We show that, on a challenging and computationally-intensive
multi-task benchmark (DMLab-30), kickstarted training improves the data
efficiency of new agents, making it significantly easier to iterate on their
design. We also show that the same kickstarting pipeline can allow a single
student agent to leverage multiple 'expert' teachers which specialize on
individual tasks. In this setting kickstarting yields surprisingly large gains,
with the kickstarted agent matching the performance of an agent trained from
scratch in almost 10x fewer steps, and surpassing its final performance by 42
percent. Kickstarting is conceptually simple and can easily be incorporated
into reinforcement learning experiments.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Improved Knowledge Distillation via Teacher Assistant
Seyed Iman Mirzadeh,Mehrdad Farajtabar,Ang Li,Nir Levine,Akihiro Matsukawa,Hassan Ghasemzadeh +5 more
TL;DR: Multi-step knowledge distillation is introduced, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher and study the effect of teacher assistant size and extend the framework to multi-step distillation.
•Posted Content
Transfer Learning in Deep Reinforcement Learning: A Survey
TL;DR: This survey surveys the field of transfer learning in the problem setting of Reinforcement Learning, providing a systematic categorization of its state-of-the-art techniques.
Proceedings Article
Red Teaming Language Models with Language Models
Ethan Perez,Saffron Huang,Francis Song,Trevor Cai,Roman Ring,John Aslanides,A. Glaese,Nathan McAleese,Geoffrey Irving +8 more
- 07 Feb 2022
TL;DR: This work automatically finds cases where a target LM behaves in a harmful way, by generating test cases (“red teaming”) using another LM, and evaluates the target LM’s replies to generated test questions using a classifier trained to detect offensive content.
379
Multi-task Deep Reinforcement Learning with PopArt
Matteo Hessel,Hubert Soyer,Lasse Espeholt,Wojciech Marian Czarnecki,Simon Schmitt,Hado van Hasselt +5 more
- 17 Jul 2019
TL;DR: This work proposes to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics, and learns a single trained policy that exceeds median human performance on this multi-task domain.
Teaching language models to support answers with verified quotes
Jacob Menick,Maja Trebacz,Vladimir Mikulik,John Aslanides,Francis Song,Martin Chadwick,Mia Glaese,Susannah Young,Lucy Campbell-Gillingham,Geoffrey Irving,Nathan McAleese +10 more
TL;DR: This work uses reinforcement learning from human preferences to train “open-book” QA models that generate answers whilst also citing specific evidence for their claims, which aids in the appraisal of correctness.
172
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Ross Girshick,Jeff Donahue,Trevor Darrell,Jitendra Malik +3 more
- 23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
•Posted Content
Distilling the Knowledge in a Neural Network
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
21.2K
•Posted Content
Rich feature hierarchies for accurate object detection and semantic segmentation
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
13.1K
•Proceedings Article
Asynchronous methods for deep reinforcement learning
Volodymyr Mnih,Adrià Puigdomènech Badia,Mehdi Mirza,Alex Graves,Tim Harley,Timothy P. Lillicrap,David Silver,Koray Kavukcuoglu +7 more
- 19 Jun 2016
TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Related Papers (5)
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more