Open AccessProceedings Article
Near-optimal Regret Bounds for Reinforcement Learning
Peter Auer,Thomas Jaksch,Ronald Ortner +2 more
- 08 Dec 2008
- Vol. 21, pp 89-96
TL;DR: This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.
read more
Abstract: For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s' there is a policy which moves from s to s' in at most D steps (on average). We present a reinforcement learning algorithm with total regret O(DS √AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Ω(√DSAT) on the total regret of any learning algorithm.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Parameter Space Noise for Exploration
Matthias Plappert,Rein Houthooft,Prafulla Dhariwal,Szymon Sidor,Richard Chen,Xi Chen,Tamim Asfour,Pieter Abbeel,Marcin Andrychowicz +8 more
TL;DR: In this article, the authors combine parameter noise with traditional RL methods to combine the best of both worlds, and demonstrate that both off-and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
504
•Proceedings Article
Learning to Explore using Active Neural SLAM
Devendra Singh Chaplot,Dhiraj Gandhi,Saurabh Gupta,Abhinav Gupta,Ruslan Salakhutdinov +4 more
- 10 Apr 2020
TL;DR: This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM', which leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies.
Reinforcement learning for control: Performance, stability, and deep approximators
TL;DR: This review mainly covers artificial-intelligence approaches to RL, from the viewpoint of the control engineer, and explains how approximate representations of the solution make RL feasible for problems with continuous states and control actions.
452
•Proceedings Article
Parameter Space Noise for Exploration
Matthias Plappert,Rein Houthooft,Prafulla Dhariwal,Szymon Sidor,Richard Chen,Xi Chen,Tamim Asfour,Pieter Abbeel,Marcin Andrychowicz +8 more
- 15 Feb 2018
TL;DR: In this paper, the authors combine parameter noise with traditional RL methods to combine the best of both worlds, and demonstrate that both off-and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.
•Posted Content
Provably Efficient Reinforcement Learning with Linear Function Approximation
TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.
360
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Book
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman
- 15 Apr 1994
TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
12.3K
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Finite-time Analysis of the Multiarmed Bandit Problem
TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
•Book
Introduction to Reinforcement Learning
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Mar 1998
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
7.7K