Finite time bounds for sampling based fitted value iteration
Csaba Szepesvári,Rémi Munos +1 more
- 07 Aug 2005
- pp 880-887
TL;DR: This paper considers sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available.
read more
Abstract: In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted Lp-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 1. Illustration of Sampling based FVI at two iterations steps (up: k = 2, down: k = 20). The dots represent the N = 100 sampled points and their values (averaged over M = 10 samples), the grey curve is the best fit (among polynomials of degree L = 4) and the thin black curve is the optimal value function. Figure 1 illustrates two value iteration steps (k = 2 and k = K = 20) of the sampling based FVI algorithm: the dots represents the points {Xn}1≤n≤N for N = 100 
Table 1. Approximation error of the optimal value function as a function of number of states N , the number of samples M , and the degree L of the fitting polynomials
Citations
A Comprehensive Survey of Multiagent Reinforcement Learning
Lucian Busoniu,Robert Babuska,B. De Schutter +2 more
- 01 Mar 2008
TL;DR: The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.
Reinforcement Learning and Dynamic Programming Using Function Approximators
Lucian Busoniu,Robert Babuska,Bart De Schutter,Damien Ernst +3 more
- 29 Apr 2010
TL;DR: Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP, with a focus on continuous-variable problems.
•Journal Article
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.
•Posted Content
Neural Approaches to Conversational AI
TL;DR: In this article, the authors present a survey of state-of-the-art neural approaches to conversational AI, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.
583
•Journal Article
Finite-Time Bounds for Fitted Value Iteration
Rémi Munos,Csaba Szepesvári +1 more
TL;DR: A theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available.
References
•Book
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
Nello Cristianini,John Shawe-Taylor +1 more
- 01 Jan 2000
TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.
15K
•Book
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman
- 15 Apr 1994
TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
12.3K
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Adapting to Unknown Smoothness via Wavelet Shrinkage
TL;DR: In this article, the authors proposed a smoothness adaptive thresholding procedure, called SureShrink, which is adaptive to the Stein unbiased estimate of risk (sure) for threshold estimates and is near minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet.
5K
Neuro-Dynamic Programming.
Dimitri P. Bertsekas
- 01 Jan 2009
TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
4.7K
Related Papers (5)
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
Geoffrey J. Gordon
- 01 Jan 1995
Sham M. Kakade,John Langford +1 more
- 08 Jul 2002