Near-optimal Regret Bounds for Reinforcement Learning

Open AccessProceedings Article

Near-optimal Regret Bounds for Reinforcement Learning

- 08 Dec 2008

- Vol. 21, pp 89-96

858

TL;DR: This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Parameter Space Noise for Exploration

Matthias Plappert, +8 more

- 06 Jun 2017

- arXiv: Learning

TL;DR: In this article, the authors combine parameter noise with traditional RL methods to combine the best of both worlds, and demonstrate that both off-and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

...read moreread less

504

•Proceedings Article

Learning to Explore using Active Neural SLAM

Devendra Singh Chaplot, +4 more

- 10 Apr 2020

TL;DR: This work presents a modular and hierarchical approach to learn policies for exploring 3D environments, called `Active Neural SLAM', which leverages the strengths of both classical and learning-based methods, by using analytical path planners with learned SLAM module, and global and local policies.

...read moreread less

488

•Journal Article•10.1016/J.ARCONTROL.2018.09.005

Reinforcement learning for control: Performance, stability, and deep approximators

Lucian Busoniu, +4 more

- 01 Jan 2018

- Annual Reviews in Control

TL;DR: This review mainly covers artificial-intelligence approaches to RL, from the viewpoint of the control engineer, and explains how approximate representations of the solution make RL feasible for problems with continuous states and control actions.

...read moreread less

452

•Proceedings Article

Parameter Space Noise for Exploration

Matthias Plappert, +8 more

- 15 Feb 2018

TL;DR: In this paper, the authors combine parameter noise with traditional RL methods to combine the best of both worlds, and demonstrate that both off-and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks.

...read moreread less

430

•Posted Content

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, +3 more

- 11 Jul 2019

- arXiv: Learning

TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.

...read moreread less

360

...

Expand

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

- 15 Apr 1994

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

12.3K

•Book Chapter•10.1007/978-1-4612-0865-5_26

Probability Inequalities for sums of Bounded Random Variables

Wassily Hoeffding

- 01 Mar 1963

- Journal of the American Statistical Asso...

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

9K

•Journal Article•10.1023/A:1013689704352

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002

- Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

8K

•Book

Introduction to Reinforcement Learning

Richard S. Sutton, +1 more

- 01 Mar 1998

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

...read moreread less

7.7K

...

Expand

Near-optimal Regret Bounds for Reinforcement Learning

Chat with Paper

AI Agents for this Paper

Citations

Parameter Space Noise for Exploration

Learning to Explore using Active Neural SLAM

Reinforcement learning for control: Performance, stability, and deep approximators

Parameter Space Noise for Exploration

Provably Efficient Reinforcement Learning with Linear Function Approximation

References

Reinforcement Learning: An Introduction

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Probability Inequalities for sums of Bounded Random Variables

Finite-time Analysis of the Multiarmed Bandit Problem

Introduction to Reinforcement Learning

Related Papers (5)

Markov Decision Processes: Discrete Stochastic Dynamic Programming

R-max - a general polynomial time algorithm for near-optimal reinforcement learning

Reinforcement Learning: An Introduction

Near-Optimal Reinforcement Learning in Polynomial Time

Finite-time Analysis of the Multiarmed Bandit Problem