Near-optimal Regret Bounds for Reinforcement Learning

Open AccessProceedings Article

Near-optimal Regret Bounds for Reinforcement Learning

- 08 Dec 2008

- Vol. 21, pp 89-96

858

TL;DR: This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.5555/1577069.1755867

Reinforcement Learning in Finite MDPs: PAC Analysis

Alexander Strehl, +2 more

- 01 Dec 2009

- Journal of Machine Learning Research

TL;DR: The current state-of-the-art for near-optimal behavior in finite Markov Decision Processes with a polynomial number of samples is summarized by presenting bounds for the problem in a unified theoretical framework.

...read moreread less

337

•Posted Content

Model-based Reinforcement Learning: A Survey

Thomas M. Moerland, +2 more

- 30 Jun 2020

- arXiv: Learning

TL;DR: A survey of the integration of model-based reinforcement learning and planning, better known as model- based reinforcement learning, and a broad conceptual overview of planning-learning combinations for MDP optimization are presented.

...read moreread less

314

•Posted Content

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Pablo Hernandez-Leal, +3 more

- 28 Jul 2017

- arXiv: Multiagent Systems

TL;DR: This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind.

...read moreread less

309

•Posted Content

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

Yaodong Yang, +1 more

- 01 Nov 2020

- arXiv: Multiagent Systems

TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

...read moreread less

233

•Proceedings Article

On the Sample Complexity of Reinforcement Learning with a Generative Model

Mohammad Gheshlaghi Azar, +2 more

- 26 Jun 2012

TL;DR: This work considers the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs) and proves new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP.

...read moreread less

227

...

Expand

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

- 15 Apr 1994

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

12.3K

•Book Chapter•10.1007/978-1-4612-0865-5_26

Probability Inequalities for sums of Bounded Random Variables

Wassily Hoeffding

- 01 Mar 1963

- Journal of the American Statistical Asso...

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

9K

•Journal Article•10.1023/A:1013689704352

Finite-time Analysis of the Multiarmed Bandit Problem

Peter Auer, +2 more

- 01 May 2002

- Machine Learning

TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

...read moreread less

8K

•Book

Introduction to Reinforcement Learning

Richard S. Sutton, +1 more

- 01 Mar 1998

TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.

...read moreread less

7.7K

...

Expand

Near-optimal Regret Bounds for Reinforcement Learning

Chat with Paper

AI Agents for this Paper

Citations

Reinforcement Learning in Finite MDPs: PAC Analysis

Model-based Reinforcement Learning: A Survey

A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

On the Sample Complexity of Reinforcement Learning with a Generative Model

References

Reinforcement Learning: An Introduction

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Probability Inequalities for sums of Bounded Random Variables

Finite-time Analysis of the Multiarmed Bandit Problem

Introduction to Reinforcement Learning

Related Papers (5)

Markov Decision Processes: Discrete Stochastic Dynamic Programming

R-max - a general polynomial time algorithm for near-optimal reinforcement learning

Reinforcement Learning: An Introduction

Near-Optimal Reinforcement Learning in Polynomial Time

Finite-time Analysis of the Multiarmed Bandit Problem