Q-learning with linear function approximation

doi:10.1007/978-3-540-72927-3_23

Book Chapter10.1007/978-3-540-72927-3_23

Q-learning with linear function approximation

Francisco S. Melo, +1 more

- 13 Jun 2007

- pp 308-322

117

TL;DR: A set of conditions that implies the convergence of Q-learning with linear function approximation with probability 1, when a fixed learning policy is used are identified.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Provably Efficient Reinforcement Learning with Linear Function Approximation

Chi Jin, +3 more

- 11 Jul 2019

- arXiv: Learning

TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.

...read moreread less

360

•Journal Article•10.1109/TMC.2019.2896950

A Machine Learning Approach to 5G Infrastructure Market Optimization

Dario Bega, +4 more

- 01 Mar 2020

- IEEE Transactions on Mobile Computing

TL;DR: A network slice admission control algorithm that ensures that the service guarantees provided to tenants are always satisfied and the design of a machine learning algorithm that can be deployed in practical settings and achieves close to optimal performance is designed.

...read moreread less

134

•Proceedings Article

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Qingfeng Lan, +3 more

- 30 Apr 2020

TL;DR: This paper proposes a generalization of Q-learning, called Maxmin Q- learning, which provides a parameter to flexibly control bias, and empirically verify that the algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.

...read moreread less

117

Journal Article•10.1109/TCIAIG.2009.2035923

Adaptive Experience Engine for Serious Games

Francesco Bellotti, +3 more

- 03 Nov 2009

- IEEE Transactions on Computational Intel...

TL;DR: This work proposes a new design methodology for the sand box serious games (SBSGs) class, decoupling content from the delivery strategy during the gameplay, and implemented an EE module based on genetic computation and reinforcement learning atop of a state-of-the-art game engine.

...read moreread less

112

•Journal Article•10.1109/JSAC.2019.2904366

Reinforcement Learning for Real-Time Optimization in NB-IoT Networks

Nan Jiang, +3 more

- 11 Mar 2019

- IEEE Journal on Selected Areas in Commun...

TL;DR: In this paper, the authors proposed reinforcement learning-based approaches to maximize the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion.

...read moreread less

109

...

Expand

References

•Journal Article•10.1007/BF00992698

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992

- Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

12K

•Book

Markov Chains and Stochastic Stability

Sean P. Meyn, +1 more

- 01 Jan 1993

TL;DR: This second edition reflects the same discipline and style that marked out the original and helped it to become a classic: proofs are rigorous and concise, the range of applications is broad and knowledgeable, and key ideas are accessible to practitioners with limited mathematical background.

...read moreread less

6.7K

Learning from delayed rewards

Chris Watkins

- 01 Jan 1989

5.9K

•Journal Article•10.1023/A:1022633531479

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988

- Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less

5.2K

Neuro-Dynamic Programming.

Dimitri P. Bertsekas

- 01 Jan 2009

TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.

...read moreread less

4.7K