Finite time bounds for sampling based fitted value iteration

doi:10.1145/1102351.1102462

Open AccessProceedings Article10.1145/1102351.1102462

Finite time bounds for sampling based fitted value iteration

Csaba Szepesvári, +1 more

- 07 Aug 2005

- pp 880-887

141

TL;DR: This paper considers sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 1. Illustration of Sampling based FVI at two iterations steps (up: k = 2, down: k = 20). The dots represent the N = 100 sampled points and their values (averaged over M = 10 samples), the grey curve is the best fit (among polynomials of degree L = 4) and the thin black curve is the optimal value function. Figure 1 illustrates two value iteration steps (k = 2 and k = K = 20) of the sampling based FVI algorithm: the dots represents the points {Xn}1≤n≤N for N = 100

Table 1. Approximation error of the optimal value function as a function of number of states N , the number of samples M , and the degree L of the fitting polynomials

Citations

•Journal Article•10.1109/TSMCC.2007.913919

A Comprehensive Survey of Multiagent Reinforcement Learning

Lucian Busoniu, +2 more

- 01 Mar 2008

TL;DR: The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.

...read moreread less

2.3K

•Reference Book•10.1201/9781439821091

Reinforcement Learning and Dynamic Programming Using Function Approximators

Lucian Busoniu, +3 more

- 29 Apr 2010

TL;DR: Reinforcement Learning and Dynamic Programming Using Function Approximators provides a comprehensive and unparalleled exploration of the field of RL and DP, with a focus on continuous-variable problems.

...read moreread less

1K

•Journal Article

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Eyal Even-Dar, +2 more

- 01 Dec 2006

- Journal of Machine Learning Research

TL;DR: A framework that is based on learning the confidence interval around the value function or the Q-function and eliminating actions that are not optimal (with high probability) is described and a model-based and model-free variants of the elimination method are provided.

...read moreread less

679

•Posted Content

Neural Approaches to Conversational AI

Jianfeng Gao, +2 more

- 21 Sep 2018

- arXiv: Computation and Language

TL;DR: In this article, the authors present a survey of state-of-the-art neural approaches to conversational AI, and discuss the progress that has been made and challenges still being faced, using specific systems and models as case studies.

...read moreread less

583

•Journal Article

Finite-Time Bounds for Fitted Value Iteration

Rémi Munos, +1 more

- 01 Jun 2008

- Journal of Machine Learning Research

TL;DR: A theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available.

...read moreread less

507

...

Expand

References

•Book

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

Nello Cristianini, +1 more

- 01 Jan 2000

TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.

...read moreread less

15K

•Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

- 15 Apr 1994

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

12.3K

•Book Chapter•10.1007/978-1-4612-0865-5_26

Probability Inequalities for sums of Bounded Random Variables

Wassily Hoeffding

- 01 Mar 1963

- Journal of the American Statistical Asso...

TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.

...read moreread less

9K

Journal Article•10.1080/01621459.1995.10476626

Adapting to Unknown Smoothness via Wavelet Shrinkage

David L. Donoho, +1 more

- 01 Dec 1995

- Journal of the American Statistical Asso...

TL;DR: In this article, the authors proposed a smoothness adaptive thresholding procedure, called SureShrink, which is adaptive to the Stein unbiased estimate of risk (sure) for threshold estimates and is near minimax simultaneously over a whole interval of the Besov scale; the size of this interval depends on the choice of mother wavelet.

...read moreread less

5K

Neuro-Dynamic Programming.

Dimitri P. Bertsekas

- 01 Jan 2009

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

...read moreread less

4.7K