Regularized Q-Learning with Linear Function Approximation

doi:10.48550/arxiv.2401.15196

Journal Article10.48550/arxiv.2401.15196

Regularized Q-Learning with Linear Function Approximation

Jiachen Xi, +2 more

- 26 Jan 2024

- arXiv.org

- Vol. abs/2401.15196

1

TL;DR: This paper considers a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation and provides a performance guarantee for the policies derived from the proposed algorithm.

Abstract: Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced exploration and robustness. With functional approximation, the convergence properties of some of these algorithms (e.g. soft Q-learning) are not well understood. In this paper, we consider a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation. The algorithm operates on two scales: a slower scale for updating the target network of the state-action values, and a faster scale for approximating the Bellman backups in the subspace of the span of basis vectors. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 2: MSPBE of the estimated state-action value functions. The graph shows the average MSPBE (± standard deviation) over 100 runs.

Table 1: Performance of Benchmark Algorithms over 20 Runs.

Citations

Journal Article

Regularized Q-learning

Han-Dong Lim, +2 more

- 11 Feb 2022

- arXiv.org

TL;DR: A new Q-learning algorithm that converges when linear function approximation is used is developed and it is proved that simply adding an appropriate regularization term ensures convergence of the algorithm.

...read moreread less

References

•Journal Article•10.1007/BF00992698

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992

- Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

12K

•Posted Content

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

Tuomas Haarnoja, +3 more

- 04 Jan 2018

- arXiv: Learning

TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.

...read moreread less

6.7K

•Book Chapter•10.1016/B978-1-55860-377-6.50013-X

Residual algorithms: reinforcement learning with function approximation

Leemon C. Baird

- 09 Jul 1995

TL;DR: Both direct and residual gradient algorithms are shown to be special cases of residual algorithms, and it is shown that residual algorithms can combine the advantages of each approach.

...read moreread less

1.2K

•Journal Article

Tree-Based Batch Mode Reinforcement Learning

Damien Ernst, +2 more

- 01 Dec 2005

- Journal of Machine Learning Research

TL;DR: Within this framework, several classical tree-based supervised learning methods and two newly proposed ensemble algorithms, namely extremely and totally randomized trees, are described and found that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples.

...read moreread less

1.2K

•Journal Article

Double Q-Learning

Hado van Hasselt

- 01 Jan 2010

- IEEE Intelligent Systems

TL;DR: An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.

...read moreread less

1K

...

Expand