Open AccessPosted Content
Efficient State Representation Learning for Dynamic Robotic Scenarios.
Zhaorun Chen,Liang Gong,Te Sun,Binhao Chen,Shenghan Xie,David Filliat,Natalia Díaz Rodríguez +6 more
TL;DR: In this article, a policy optimization via abstract representation (POA-SRL) is proposed to learn to encode task-relevant features from complex sensory data into low-dimensional states.
read more
Abstract: While the rapid progress of deep learning fuels end-to-end reinforcement learning (RL), direct application, especially in high-dimensional space like robotic scenarios still suffers from high sample efficiency. Therefore State Representation Learning (SRL) is proposed to specifically learn to encode task-relevant features from complex sensory data into low-dimensional states. However, the pervasive implementation of SRL is usually conducted by a decoupling strategy in which the observation-state mapping is learned separately, which is prone to over-fit. To handle such problem, we present a new algorithm called Policy Optimization via Abstract Representation which integrates SRL into the original RL scale. Firstly, We engage RL loss to assist in updating SRL model so that the states can evolve to meet the demand of reinforcement learning and maintain a good physical interpretation. Secondly, we introduce a dynamic parameter adjustment mechanism so that both models can efficiently adapt to each other. Thirdly, we introduce a new prior called domain resemblance to leverage expert demonstration to train the SRL model. Finally, we provide a real-time access by state graph to monitor the course of learning. Results show that our algorithm outperforms the PPO baselines and decoupling strategies in terms of sample efficiency and final rewards. Thus our model can efficiently deal with tasks in high dimensions and facilitate training real-life robots directly from scratch.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
•Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
82.5K
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
•Posted Content
Proximal Policy Optimization Algorithms
TL;DR: A new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective function using stochastic gradient ascent, are proposed.
18K