Open AccessProceedings Article
Dynamic Policy Programming with Function Approximation
Mohammad Gheshlaghi Azar,Vicenç Gómez,Bert Kappen +2 more
- 14 Jun 2011
- pp 119-127
TL;DR: This paper proposes a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellmanlike iteration, and establishes L∞-norm loss bounds for the performance of the policy induced by DPP and proves that it asymptotically converges to the optimal policy.
read more
Abstract: In this paper, we consider the problem of planning in the infinite-horizon discountedreward Markov decision problems. We propose a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellmanlike iteration. For discrete state-action case, we establish L∞-norm loss bounds for the performance of the policy induced by DPP and prove that it asymptotically converges to the optimal policy. Then, we generalize our approach to large-scale (continuous) state-action problems using function approximation technique. We provide L∞norm performance-loss bounds for approximate DPP and compare these bounds with the standard results from approximate dynamic programming (ADP) showing that approximate DPP results in a tighter asymptotic bound than standard ADP methods. We also numerically compare the performance of DPP to other ADP and RL methods. We observe that approximate DPP asymptotically outperforms other methods on the mountain-car problem.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Walking and Running on Yielding and Fluidizing Ground
Feifei Qian,Tingnan Zhang,Chen Li,Aaron M. Hoover,Pierangelo Masarati,Paul M. Birkmeyer,Andrew Pullin,Ronald S. Fearing,Daniel I. Goldman +8 more
- 09 Jul 2012
TL;DR: Presented at Robotics: Science and Systems VIII, July 09-July 13, 2012, University of Sydney, Sydney, NSW, Australia.
1K
Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation
TL;DR: Two sample efficient DRL algorithms are proposed that combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples.
200
•Posted Content
Bridging the Gap Between Value and Policy Based Reinforcement Learning
TL;DR: Path Consistency Learning (PCL) as mentioned in this paper minimizes a notion of soft consistency error along multi-step action sequences extracted from both on-and off-policy traces, which can be interpreted as generalizing both actor-critic and Q-learning algorithms.
113
•Dissertation
On the theory of reinforcement learning : methods, convergence analysis and sample complexity
M.G. Azar
- 01 Jan 2012
70
•Proceedings Article
On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
Bruno Scherrer,Boris Lesner +1 more
- 03 Dec 2012
TL;DR: In this paper, the authors consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy and develop variations of value and policy iterative for computing non-stationary policies that can be up to 2γ/1-γ e-optimal.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Book
Dynamic Programming and Optimal Control
Dimitri P. Bertsekas
- 01 May 1995
TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
•Proceedings Article
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Richard S. Sutton,David McAllester,Satinder Singh,Yishay Mansour +3 more
- 29 Nov 1999
TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
•Proceedings Article
Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding
Richard S. Sutton
- 27 Nov 1995
TL;DR: It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.
•Proceedings Article
A Natural Policy Gradient
Sham M. Kakade
- 03 Jan 2001
TL;DR: This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.
Related Papers (5)
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
Hado van Hasselt,Arthur Guez,David Silver +2 more
- 12 Feb 2016