Regularized Q-learning

Journal Article

Regularized Q-learning

Han-Dong Lim, +2 more

- 11 Feb 2022

- arXiv.org

- Vol. abs/2202.05404

7

TL;DR: A new Q-learning algorithm that converges when linear function approximation is used is developed and it is proved that simply adding an appropriate regularization term ensures convergence of the algorithm.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 4: Counter-examples where Q-learning with linear function approximation diverges

Figure 5: Learning curve under different learning rate and regularization coefficient

Table 1: Result of episode reward, step size = 0.1. The columns correspond to η, and rows correspond to number of tiles.

Citations

Journal Article•10.48550/arXiv.2307.02632

Stability of Q-Learning Through Design and Optimism

Sean P. Meyn

- 05 Jul 2023

- arXiv.org

TL;DR: In this article, the authors present new approaches to ensure stability and potentially accelerated convergence for reinforcement learning algorithms, and stochastic approximation in other settings, and provide details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in France, June 2023.

...read moreread less

6

Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective

Donghwan Lee, +1 more

- 22 Apr 2022

TL;DR: In this article , a finite-time analysis of tabular temporal difference (TD) learning is presented, which makes direct and effective use of discrete-time stochastic linear system models and leverages Schur matrix properties.

...read moreread less

Preprint•10.48550/arxiv.2405.02201

Regularized Q-learning through Robust Averaging

Peter Schmitt-Förster, +1 more

- 03 May 2024

TL;DR: 2RA Q-learning is a new Q-learning variant that addresses estimation bias and achieves better performance than existing methods.

...read moreread less

Journal Article•10.48550/arXiv.2306.17750

TD Convergence: An Optimization Perspective

Kavosh Asadi, +4 more

- 30 Jun 2023

- arXiv.org

TL;DR: In this paper , the authors study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm and show that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration.

...read moreread less

Journal Article•10.48550/arxiv.2309.16819

Multi-Bellman operator for convergence of Q-learning with linear function approximation

Diogo S. Carvalho, +2 more

- 28 Sep 2023

- arXiv.org

TL;DR: This work introduces a novel multi-Bellman operator that extends the traditional Bellman operator, and proposes the multi $Q$-learning algorithm with linear function approximation, demonstrating that this algorithm converges to the fixed-point of the projected multi- Bellman operators, yielding solutions of arbitrary accuracy.

...read moreread less

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Journal Article•10.1007/BF00992698

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992

- Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

12K

•Journal Article•10.1214/AOMS/1177729586

A Stochastic Approximation Method

Herbert Robbins, +1 more

- 01 Sep 1951

- Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

11.3K

•Book

Switching in Systems and Control

Daniel Liberzon

- 24 Jun 2003

TL;DR: I. Stability under Arbitrary Switching, Systems not Stabilizable by Continuous Feedback, and Systems with Sensor or Actuator Constraints with Large Modeling Uncertainty.

...read moreread less

7.2K

...

Expand

Regularized Q-learning

Chat with Paper

AI Agents for this Paper

Figures

Citations

Stability of Q-Learning Through Design and Optimism

Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective

Regularized Q-learning through Robust Averaging

TD Convergence: An Optimization Perspective

Multi-Bellman operator for convergence of Q-learning with linear function approximation

References

Reinforcement Learning: An Introduction

Human-level control through deep reinforcement learning

Technical Note : \cal Q -Learning

A Stochastic Approximation Method

Switching in Systems and Control