Journal Article
Regularized Q-learning
TL;DR: A new Q-learning algorithm that converges when linear function approximation is used is developed and it is proved that simply adding an appropriate regularization term ensures convergence of the algorithm.
read more
Abstract: Q-learning is widely used algorithm in reinforcement learning community. Under the lookup table setting, its convergence is well established. However, its behavior is known to be unstable with the linear function approximation case. This paper develops a new Q-learning algorithm that converges when linear function approximation is used. We prove that simply adding an appropriate regularization term ensures convergence of the algorithm. We prove its stability using a recent analysis tool based on switching system models. Moreover, we experimentally show that it converges in environments where Q-learning with linear function approximation has known to diverge. We also provide an error bound on the solution where the algorithm converges.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 4: Counter-examples where Q-learning with linear function approximation diverges 
Figure 5: Learning curve under different learning rate and regularization coefficient 
Table 1: Result of episode reward, step size = 0.1. The columns correspond to η, and rows correspond to number of tiles. 
Figure 3: State transition diagram 
Figure 6: O.D.E. results 
Figure 2: Experiment results
Citations
Stability of Q-Learning Through Design and Optimism
TL;DR: In this article, the authors present new approaches to ensure stability and potentially accelerated convergence for reinforcement learning algorithms, and stochastic approximation in other settings, and provide details regarding the INFORMS APS inaugural Applied Probability Trust Plenary Lecture, presented in France, June 2023.
6
Finite-Time Analysis of Temporal Difference Learning: Discrete-Time Linear System Perspective
Donghwan Lee,Do Wan Kim +1 more
- 22 Apr 2022
TL;DR: In this article , a finite-time analysis of tabular temporal difference (TD) learning is presented, which makes direct and effective use of discrete-time stochastic linear system models and leverages Schur matrix properties.
Regularized Q-learning through Robust Averaging
Peter Schmitt-Förster,Tobias Sutter +1 more
- 03 May 2024
TL;DR: 2RA Q-learning is a new Q-learning variant that addresses estimation bias and achieves better performance than existing methods.
TD Convergence: An Optimization Perspective
TL;DR: In this paper , the authors study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm and show that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration.
Multi-Bellman operator for convergence of Q-learning with linear function approximation
Diogo S. Carvalho,Pedro A. Santos,Francisco Melo +2 more
TL;DR: This work introduces a novel multi-Bellman operator that extends the traditional Bellman operator, and proposes the multi $Q$-learning algorithm with linear function approximation, demonstrating that this algorithm converges to the fixed-point of the projected multi- Bellman operators, yielding solutions of arbitrary accuracy.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Technical Note : \cal Q -Learning
Chris Watkins,Peter Dayan +1 more
TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
A Stochastic Approximation Method
Herbert Robbins,Sutton Monro +1 more
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
•Book
Switching in Systems and Control
Daniel Liberzon
- 24 Jun 2003
TL;DR: I. Stability under Arbitrary Switching, Systems not Stabilizable by Continuous Feedback, and Systems with Sensor or Actuator Constraints with Large Modeling Uncertainty.
7.2K