Open AccessProceedings Article
Near-optimal Regret Bounds for Reinforcement Learning
Peter Auer,Thomas Jaksch,Ronald Ortner +2 more
- 08 Dec 2008
- Vol. 21, pp 89-96
TL;DR: This work presents a reinforcement learning algorithm with total regret O(DS√AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D, and proposes a new parameter: An MDP has diameter D if for any pair of states s,s' there is a policy which moves from s to s' in at most D steps.
read more
Abstract: For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy. In order to describe the transition structure of an MDP we propose a new parameter: An MDP has diameter D if for any pair of states s, s' there is a policy which moves from s to s' in at most D steps (on average). We present a reinforcement learning algorithm with total regret O(DS √AT) after T steps for any unknown MDP with S states, A actions per state, and diameter D. This bound holds with high probability. We also present a corresponding lower bound of Ω(√DSAT) on the total regret of any learning algorithm.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Reinforcement Learning in Finite MDPs: PAC Analysis
TL;DR: The current state-of-the-art for near-optimal behavior in finite Markov Decision Processes with a polynomial number of samples is summarized by presenting bounds for the problem in a unified theoretical framework.
•Posted Content
Model-based Reinforcement Learning: A Survey
TL;DR: A survey of the integration of model-based reinforcement learning and planning, better known as model- based reinforcement learning, and a broad conceptual overview of planning-learning combinations for MDP optimization are presented.
•Posted Content
A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity
TL;DR: This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind.
309
•Posted Content
An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective
Yaodong Yang,Jun Wang +1 more
TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.
233
•Proceedings Article
On the Sample Complexity of Reinforcement Learning with a Generative Model
Mohammad Gheshlaghi Azar,Bert Kappen,R mi Munos +2 more
- 26 Jun 2012
TL;DR: This work considers the problems of learning the optimal action-value function and the optimal policy in discounted-reward Markov decision processes (MDPs) and proves new PAC bounds on the sample-complexity of two well-known model-based reinforcement learning (RL) algorithms in the presence of a generative model of the MDP.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Book
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman
- 15 Apr 1994
TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
12.3K
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Finite-time Analysis of the Multiarmed Bandit Problem
TL;DR: This work shows that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
•Book
Introduction to Reinforcement Learning
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Mar 1998
TL;DR: In Reinforcement Learning, Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning.
7.7K