Book Chapter10.1007/978-3-540-72927-3_23
Q-learning with linear function approximation
Francisco S. Melo,M. Isabel Ribeiro +1 more
- 13 Jun 2007
- pp 308-322
TL;DR: A set of conditions that implies the convergence of Q-learning with linear function approximation with probability 1, when a fixed learning policy is used are identified.
read more
Abstract: In this paper, we analyze the convergence of Q-learning with linear function approximation. We identify a set of conditions that implies the convergence of this method with probability 1, when a fixed learning policy is used. We discuss the differences and similarities between our results and those obtained in several related works. We also discuss the applicability of this method when a changing policy is used. Finally, we describe the applicability of this approximate method in partially observable scenarios.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Provably Efficient Reinforcement Learning with Linear Function Approximation
TL;DR: This paper proves that an optimistic modification of Least-Squares Value Iteration (LSVI) achieves regret, where d is the ambient dimension of feature space, H is the length of each episode, and T is the total number of steps, and is independent of the number of states and actions.
360
A Machine Learning Approach to 5G Infrastructure Market Optimization
TL;DR: A network slice admission control algorithm that ensures that the service guarantees provided to tenants are always satisfied and the design of a machine learning algorithm that can be deployed in practical settings and achieves close to optimal performance is designed.
•Proceedings Article
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
Qingfeng Lan,Yangchen Pan,Alona Fyshe,Martha White +3 more
- 30 Apr 2020
TL;DR: This paper proposes a generalization of Q-learning, called Maxmin Q- learning, which provides a parameter to flexibly control bias, and empirically verify that the algorithm better controls estimation bias in toy environments, and that it achieves superior performance on several benchmark problems.
Adaptive Experience Engine for Serious Games
TL;DR: This work proposes a new design methodology for the sand box serious games (SBSGs) class, decoupling content from the delivery strategy during the gameplay, and implemented an EE module based on genetic computation and reinforcement learning atop of a state-of-the-art game engine.
112
Reinforcement Learning for Real-Time Optimization in NB-IoT Networks
TL;DR: In this paper, the authors proposed reinforcement learning-based approaches to maximize the long-term average number of served IoT devices at each transmission time interval (TTI) in an online fashion.
References
Technical Note : \cal Q -Learning
Chris Watkins,Peter Dayan +1 more
TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
•Book
Markov Chains and Stochastic Stability
Sean P. Meyn,Richard L. Tweedie +1 more
- 01 Jan 1993
TL;DR: This second edition reflects the same discipline and style that marked out the original and helped it to become a classic: proofs are rigorous and concise, the range of applications is broad and knowledgeable, and key ideas are accessible to practitioners with limited mathematical background.
6.7K
Learning to Predict by the Methods of Temporal Differences
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Neuro-Dynamic Programming.
Dimitri P. Bertsekas
- 01 Jan 2009
TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
4.7K
Related Papers (5)
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
Chris Watkins
- 01 Jan 1989