Differentiable function

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Proceedings Article•

Policy Gradient Methods for Reinforcement Learning with Function Approximation

[...]

Richard S. Sutton¹, David McAllester¹, Satinder Singh¹, Yishay Mansour¹•Institutions (1)

AT&T Labs¹

29 Nov 1999

TL;DR: This paper proves for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less

Abstract: Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

...read moreread less

7,133 citations

Journal Article•10.1063/1.456010•

An improved algorithm for reaction path following

[...]

Carlos Gonzalez¹, H. Bernhard Schlegel•Institutions (1)

Wayne State University¹

15 Feb 1989-Journal of Chemical Physics

TL;DR: In this article, a second order algorithm for finding points on a steepest descent path from the transition state of the reactants and products is presented. But the points are optimized so that the segment of the reaction path between any two adjacent points is given by an arc of a circle, and the gradient at each point is tangent to the path.

...read moreread less

Abstract: A new algorithm is presented for obtaining points on a steepest descent path from the transition state of the reactants and products. In mass‐weighted coordinates, this path corresponds to the intrinsic reaction coordinate. Points on the reaction path are found by constrained optimizations involving all internal degrees of freedom of the molecule. The points are optimized so that the segment of the reaction path between any two adjacent points is given by an arc of a circle, and so that the gradient at each point is tangent to the path. Only the transition vector and the energy gradients are needed to construct the path. The resulting path is continuous, differentiable and piecewise quadratic. In the limit of small step size, the present algorithm is shown to take a step with the correct tangent vector and curvature vector; hence, it is a second order algorithm. The method has been tested on the following reactions: HCN→CNH, SiH2+H2→SiH4, CH4+H→CH3+H2, F−+CH3F→FCH3+F−, and C2H5F→C2H4+HF. Reaction paths calculated with a step size of 0.4 a.u. are almost identical to those computed with a step size of 0.1 a.u. or smaller.

...read moreread less

5,898 citations

Journal Article•10.1021/J100377A021•

Reaction Path Following in Mass-Weighted Internal Coordinates

[...]

Carlos Gonzalez¹, H. Bernhard Schlegel•Institutions (1)

Wayne State University¹

01 Jul 1990-The Journal of Physical Chemistry

TL;DR: In this article, the authors extended their previous algorithm for following reaction paths downhill to use mass-weighted internal coordinates, which has the correct tangent vector and curvature vectors in the limit or small step size but requires only the transition vector and the energy gradients.

...read moreread less

Abstract: Our previous algorithm for following reaction paths downhill (J. Chem. Phys. 1989, 90, 2154), has been extended to use mass-weighted internal coordinates. Points on the reaction path are round by constrained optimizations involving the internal degrees or freedom or the molecule. The points are optimized so that the segment or the reaction path between any two adjacent points is described by an arc or a circle in mass-weighted internal coordinates, and so that the gradients (in mass-weighted internals) at the end points or the arc are tangent to the path. The algorithm has the correct tangent vector and curvature vectors in the limit or small step size but requires only the transition vector and the energy gradients; the resulting path is continuous, differentiable, and piecewise quadratic

...read moreread less

5,759 citations

Book•

Approximation Theorems of Mathematical Statistics

[...]

Robert Serfling

8 Dec 1980

TL;DR: In this paper, the basic sample statistics are used for Parametric Inference, and the Asymptotic Theory in Parametric Induction (ATIP) is used to estimate the relative efficiency of given statistics.

...read moreread less

Abstract: Preliminary Tools and Foundations. The Basic Sample Statistics. Transformations of Given Statistics. Asymptotic Theory in Parametric Inference. U--Statistics. Von Mises Differentiable Statistical Functions. M--Estimates. L--Estimates. R--Estimates. Asymptotic Relative Efficiency. Appendix. References. Author Index. Subject Index.

...read moreread less

5,732 citations

Journal Article•10.1007/BF01020332•

Quantitative universality for a class of nonlinear transformations

[...]

Mitchell J. Feigenbaum¹•Institutions (1)

Los Alamos National Laboratory¹

01 Jul 1978-Journal of Statistical Physics

TL;DR: In this article, a large class of recursion relations xn+l = Af(xn) exhibiting infinite bifurcation is shown to possess a rich quantitative structure essentially independent of the recursion function.

...read moreread less

Abstract: A large class of recursion relations xn+l = Af(xn) exhibiting infinite bifurcation is shown to possess a rich quantitative structure essentially independent of the recursion function. The functions considered all have a unique differentiable maximum 2. With f(2) - f(x) ~ Ix - 21" (for Ix - 21 sufficiently small), z > 1, the universal details depend only upon z. In particular, the local structure of high-order stability sets is shown to approach universality, rescaling in successive bifurcations, asymptotically by the ratio c~ (a = 2.5029078750957... for z = 2). This structure is determined by a universal function g*(x), where the 2"th iterate off, f("~, converges locally to ~-"g*(~nx) for large n. For ithe class of f's considered, there exists a A~ such that a 2"-point stable limit cycle including :7 exists; A~ - ~ ~ ~-" (~ = 4.669201609103... for z = 2). The numbers = and have been computationally determined for a range of z through their definitions, for a variety off's for each z. We present a recursive mechanism that explains these results by determining g* as the fixed-point (function) of a transformation on the class off's. At present our treatment is heuristic. In a sequel, an exact theory is formulated and specific problems of rigor isolated.

...read moreread less

3,584 citations

...

Expand

Year	Papers
2026	9
2025	467
2024	848
2023	1,356
2022	2,082
2021	745

Topic Tools

Papers published on a yearly basis

Papers

Policy Gradient Methods for Reinforcement Learning with Function Approximation

An improved algorithm for reaction path following

Reaction Path Following in Mass-Weighted Internal Coordinates

Approximation Theorems of Mathematical Statistics

Quantitative universality for a class of nonlinear transformations

Related Topics (5)

Performance Metrics