Journal Article10.48550/arxiv.2401.15196
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi,Alfredo Garcia,Petar Momčilović +2 more
TL;DR: This paper considers a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation and provides a performance guarantee for the policies derived from the proposed algorithm.
read more
Abstract: Several successful reinforcement learning algorithms make use of regularization to promote multi-modal policies that exhibit enhanced exploration and robustness. With functional approximation, the convergence properties of some of these algorithms (e.g. soft Q-learning) are not well understood. In this paper, we consider a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation. The algorithm operates on two scales: a slower scale for updating the target network of the state-action values, and a faster scale for approximating the Bellman backups in the subspace of the span of basis vectors. We show that, under certain assumptions, the proposed algorithm converges to a stationary point in the presence of Markovian noise. In addition, we provide a performance guarantee for the policies derived from the proposed algorithm.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures
Citations
Journal Article
Regularized Q-learning
TL;DR: A new Q-learning algorithm that converges when linear function approximation is used is developed and it is proved that simply adding an appropriate regularization term ensures convergence of the algorithm.
References
Technical Note : \cal Q -Learning
Chris Watkins,Peter Dayan +1 more
TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
•Posted Content
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
TL;DR: In this article, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework is proposed, where the actor aims to maximize expected reward while also maximizing entropy.
6.7K
Residual algorithms: reinforcement learning with function approximation
Leemon C. Baird
- 09 Jul 1995
TL;DR: Both direct and residual gradient algorithms are shown to be special cases of residual algorithms, and it is shown that residual algorithms can combine the advantages of each approach.
•Journal Article
Tree-Based Batch Mode Reinforcement Learning
TL;DR: Within this framework, several classical tree-based supervised learning methods and two newly proposed ensemble algorithms, namely extremely and totally randomized trees, are described and found that the ensemble methods based on regression trees perform well in extracting relevant information about the optimal control policy from sets of four-tuples.
•Journal Article
Double Q-Learning
TL;DR: An alternative way to approximate the maximum expected value for any set of random variables is introduced and the obtained double estimator method is shown to sometimes underestimate rather than overestimate themaximum expected value.
1K


