Gradient theorem

Topic Tools

Papers

Proceedings Article•

Policy Gradient Methods for Reinforcement Learning with Function Approximation

[...]

Richard S. Sutton¹, David McAllester¹, Satinder Singh¹, Yishay Mansour¹•Institutions (1)

29 Nov 1999

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less

Abstract: Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less

7,133 citations

Book•

Differential forms and applications

[...]

Manfredo P. do Carmo

1 Jan 1994

TL;DR: In this paper, the Structure Equations of Rn and Surfaces in Rn have been studied and the Theorem of Gauss-Bonnet has been proved for the integration of differential forms.

...read moreread less

Abstract: 1. Differential Forms in Rn.- 2. Line Integrals.- 3. Differentiable Manifolds.- 4. Integration on Manifolds Stokes Theorem and Poincare's Lemma.- 1. Integration of Differential Forms.- 2. Stokes Theorem.- 3. Poincare's Lemma.- 5. Differential Geometry of Surfaces.- 1. The Structure Equations of Rn.- 2. Surfaces in R3.- 3. Intrinsic Geometry of Surfaces.- 6. The Theorem of Gauss-Bonnet and the Theorem of Morse.- 1. The Theorem of Gauss-Bonnet.- 2. The Theorem of Morse.- References.

...read moreread less

145 citations

Posted Content•

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

[...]

Philip S. Thomas, Emma Brunskill

20 Jun 2017-arXiv: Artificial Intelligence

TL;DR: It is shown how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by Sutton et al. 2000.

...read moreread less

Abstract: We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000).

...read moreread less

66 citations

Book•

Differential Forms: A Complement to Vector Calculus

[...]

Steven H. Weintraub

20 Aug 1997

TL;DR: Differential forms The Algrebra of Differential Forms Exterior Differentiation The Fundamental Correspondence Oriented Manifolds The Notion of a Manifold (With Boundary) Orientation Differential forms Revisited l-forms K-Forms Push-Forwards And Pull-Backs Integration Of Differential Form over OrientedManifolds Integration Via Pull-Back Support Integration Via pull-back Support Integration as discussed by the authors The Generalized Stokes' Theorem Statement Of The Theorem The Fundamental Theorem of Calculus And its Analog For Line Integrals Green

...read moreread less

Abstract: Differential Forms The Algrebra of Differential Forms Exterior Differentiation The Fundamental Correspondence Oriented Manifolds The Notion Of A Manifold (With Boundary) Orientation Differential Forms Revisited l-Forms K-Forms Push-Forwards And Pull-Backs Integration Of Differential Forms Over Oriented Manifolds The Integral Of A 0-Form Over A Point (Evaluation) The Integral Of A 1-Form Over A Curve (Line Integrals) The Integral Of A2-Form Over A Surface (Flux Integrals) The Integral Of A 3-Form Over A Solid Body (Volume Integrals) Integration Via Pull-Backs The Generalized Stokes' Theorem Statement Of The Theorem The Fundamental Theorem Of Calculus And Its Analog For Line Integrals Green's And Stokes' Theorems Gauss's Theorem Proof of the GST For The Advanced Reader Differential Forms In IRN And Poincare's Lemma Manifolds, Tangent Vectors, And Orientations The Basics of De Rham Cohomology Appendix Answers To Exercises Subject Index

...read moreread less

52 citations

Proceedings Article•

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

[...]

Ehsan Imani¹, Eric Graves¹, Martha White¹•Institutions (1)

University of Alberta¹

1 Jan 2018

TL;DR: In this paper, the actor critic with emphatic weighting (ACE) algorithm is proposed to approximate the simplified gradient provided by the policy gradient theorem for off-policy reinforcement learning, where the behaviour policy is not necessarily attempting to learn and follow the optimal policy for the given task.

...read moreread less

Abstract: Policy gradient methods are widely used for control in reinforcement learning, particularly for the continuous action setting. There have been a host of theoretically sound algorithms proposed for the on-policy setting, due to the existence of the policy gradient theorem which provides a simplified form for the gradient. In off-policy learning, however, where the behaviour policy is not necessarily attempting to learn and follow the optimal policy for the given task, the existence of such a theorem has been elusive. In this work, we solve this open problem by providing the first off-policy policy gradient theorem. The key to the derivation is the use of emphatic weightings. We develop a new actor-critic algorithm—called Actor Critic with Emphatic weightings (ACE)—that approximates the simplified gradients provided by the theorem. We demonstrate in a simple counterexample that previous off-policy policy gradient methods—particularly OffPAC and DPG—converge to the wrong solution whereas ACE finds the optimal solution.

...read moreread less

43 citations

...

Expand

Year	Papers
2021	3
2020	5
2019	4
2018	4
2017	1
2016	1

Topic Tools

Papers

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Differential forms and applications

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

Differential Forms: A Complement to Vector Calculus

An Off-policy Policy Gradient Theorem Using Emphatic Weightings

Related Topics (5)

Performance Metrics