Dynamic Policy Programming with Function Approximation

Open AccessProceedings Article

Dynamic Policy Programming with Function Approximation

- 14 Jun 2011

- pp 119-127

37

TL;DR: This paper proposes a novel iterative method, called dynamic policy programming (DPP), which updates the parametrized policy by a Bellmanlike iteration, and establishes L∞-norm loss bounds for the performance of the policy induced by DPP and proves that it asymptotically converges to the optimal policy.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Book•10.7551/MITPRESS/9816.001.0001

Walking and Running on Yielding and Fluidizing Ground

Feifei Qian, +8 more

- 09 Jul 2012

TL;DR: Presented at Robotics: Science and Systems VIII, July 09-July 13, 2012, University of Sydney, Sydney, NSW, Australia.

...read moreread less

1K

•Journal Article•10.1016/J.ROBOT.2018.11.004

Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

Yoshihisa Tsurumine, +3 more

- 01 Feb 2019

- Robotics and Autonomous Systems

TL;DR: Two sample efficient DRL algorithms are proposed that combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples.

...read moreread less

200

•Posted Content

Bridging the Gap Between Value and Policy Based Reinforcement Learning

Ofir Nachum, +3 more

- 28 Feb 2017

- arXiv: Artificial Intelligence

TL;DR: Path Consistency Learning (PCL) as mentioned in this paper minimizes a notion of soft consistency error along multi-step action sequences extracted from both on-and off-policy traces, which can be interpreted as generalizing both actor-critic and Q-learning algorithms.

...read moreread less

113

•Dissertation

On the theory of reinforcement learning : methods, convergence analysis and sample complexity

M.G. Azar

- 01 Jan 2012

70

•Proceedings Article

On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Bruno Scherrer, +1 more

- 03 Dec 2012

TL;DR: In this paper, the authors consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy and develop variations of value and policy iterative for computing non-stationary policies that can be up to 2γ/1-γ e-optimal.

...read moreread less

48

...

Expand

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Book

Dynamic Programming and Optimal Control

Dimitri P. Bertsekas

- 01 May 1995

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

...read moreread less

12.9K

•Proceedings Article

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Richard S. Sutton, +3 more

- 29 Nov 1999

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less

7.1K

•Proceedings Article

Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding

Richard S. Sutton

- 27 Nov 1995

TL;DR: It is concluded that reinforcement learning can work robustly in conjunction with function approximators, and that there is little justification at present for avoiding the case of general λ.

...read moreread less

1.3K

•Proceedings Article

A Natural Policy Gradient

Sham M. Kakade

- 03 Jan 2001

TL;DR: This work provides a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space and shows drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris.

...read moreread less

1.2K