On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning
07 Jun 2022
TL;DR: The authors showed that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
read more
Abstract: Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
TL;DR: HDP is a hierarchical agent for multi-task robotic manipulation that factorises a manipulation policy into a high-level task-planning agent and a low-level goal-conditioned diffusion policy. It generates context-aware motion trajectories while satisfying robot kinematics constraints.
SIRL: Similarity-based Implicit Representation Learning
Andreea Bobu,Yi Liu,Rohin Shah,Daniel S. Brown,Anca D. Dragan +4 more
- 02 Jan 2023
TL;DR: In contrastive learning as mentioned in this paper , the goal is to identify and isolate the causal features that people actually care about and use when they represent states and behaviors, and to learn representations that are more generalizable than self-supervised and task-input alternatives.
A fast interpretable adaptive meta-learning enhanced deep learning framework for diagnosis of diabetic retinopathy
Maofa Wang,Qizhou Gong,Quan Wan,Zhixiong Leng,Yanlin Xu,Bingchen Yan,He Zhang,Hongliang Huang,Shaohua Sun +8 more
TL;DR: This study introduces FIAML-LR, a novel meta-learning framework that balances interpretability and accuracy for few-shot classification tasks, particularly in diabetic retinopathy diagnosis, achieving a 14.28% accuracy boost with limited data.
5
Integrating Drone Imagery and AI for Improved Construction Site Management through Building Information Modeling
Wonjun Choi,Seung-Cheul Na,Seokjae Heo +2 more
TL;DR: This study explores the integration of drone imagery into the digital construction site management process, aiming to create BIM models with enhanced object recognition capabilities, underscoring the complexity of the task and laying the groundwork for future innovations in this area.
5
Deep reinforcement learning for real-world quadrupedal locomotion: a comprehensive review
TL;DR: This review article systematically organize and summarize relevant important literature, covering DRL algorithms from problem setting to advanced learning methods, and core components in the algorithm design, such as state and action spaces, reward functions, and solutions to reality gap problems.
References
•Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
18K
•Proceedings Article
Model-agnostic meta-learning for fast adaptation of deep networks
Chelsea Finn,Pieter Abbeel,Sergey Levine +2 more
- 06 Aug 2017
TL;DR: An algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning is proposed.
•Proceedings Article
Optimization as a Model for Few-Shot Learning
Sachin Ravi,Hugo Larochelle +1 more
- 24 Apr 2017
TL;DR: In this paper, an LSTM-based meta-learner model is proposed to learn the exact optimization algorithm used to train another learner neural network in the few-shot regime.
3.8K
The arcade learning environment: an evaluation platform for general agents
TL;DR: The Arcade Learning Environment (ALE) as discussed by the authors is a platform for evaluating the development of general, domain-independent AI technology, which provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players.
•Posted Content
Rainbow: Combining Improvements in Deep Reinforcement Learning
Matteo Hessel,Joseph Modayil,Hado van Hasselt,Tom Schaul,Georg Ostrovski,Will Dabney,Dan Horgan,Bilal Piot,Mohammad Gheshlaghi Azar,David Silver +9 more
TL;DR: This paper examines six extensions to the DQN algorithm and empirically studies their combination, showing that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance.