Efficient reinforcement learning using recursive least-squares methods

doi:10.1613/JAIR.946

Open AccessJournal Article10.1613/JAIR.946

Efficient reinforcement learning using recursive least-squares methods

Xin Xu, +2 more

- 01 Jan 2002

- Journal of Artificial Intelligence Resea...

- Vol. 16, Iss: 1, pp 259-292

150

TL;DR: RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed and it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic.

Abstract: The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(λ) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(λ) can be viewed as the extension of RLS-TD(0) from λ =0 to general 0≤ λ ≤1, so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(λ) are proved for ergodic Markov chains. Compared to the existing LS-TD(λ) algorithm, RLS-TD(λ) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(λ) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(λ) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(λ). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(λ) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1145/242224.242229

Machine learning

Thomas G. Dietterich

- 01 Dec 1996

- ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

14K

•Book

Algorithms for Reinforcement Learning

Csaba Szepesvári

- 25 Jun 2010

TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.

...read moreread less

1.2K

Journal Article•10.1109/TSMC.2014.2358639

Multiobjective Reinforcement Learning: A Comprehensive Overview

Chunming Liu, +2 more

- 01 Mar 2015

TL;DR: The basic architecture, research topics, and naïve solutions of MORL are introduced at first and several representative MORL approaches and some important directions of recent research are comprehensively reviewed.

...read moreread less

408

Journal Article•10.1080/10798587.2004.10642862

Introducing Belbic: Brain Emotional Learning Based Intelligent Controller

Caro Lucas, +2 more

- 01 Jan 2004

- Intelligent Automation and Soft Computin...

TL;DR: This paper has adapted a computational model based on the limbic system in the mammalian brain for control engineering applications, and applied the proposed controller for some SISO, MIMO and nonlinear systems.

...read moreread less

367

•Journal Article•10.1109/TNN.2007.899161

Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

Xin Xu, +2 more

- 01 Jul 2007

- IEEE Transactions on Neural Networks

TL;DR: The KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs) and can be applied to online learning control by incorporating an initial controller to ensure online performance.

...read moreread less

332

...

Expand

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

Journal Article•10.1145/242224.242229

Machine learning

Thomas G. Dietterich

- 01 Dec 1996

- ACM Computing Surveys

TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.

...read moreread less

14K

Journal Article•10.1016/0967-0661(96)82838-3

Adaptive Filter Theory

Steve Rogers

- 01 Nov 1996

- Control Engineering Practice

TL;DR: A guide to using artificial intelligence in the filmmaking process, as well as practical suggestions for improving the quality and efficiency of existing and new approaches.

...read moreread less

12.6K

•Journal Article•10.1007/BF00992698

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992

- Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

12K

•Journal Article•10.1613/JAIR.301

Reinforcement learning: a survey

Leslie Pack Kaelbling, +2 more

- 01 Jan 1996

- Journal of Artificial Intelligence Resea...

TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.

...read moreread less

9K

...

Expand

Efficient reinforcement learning using recursive least-squares methods

Chat with Paper

AI Agents for this Paper

Citations

Machine learning

Algorithms for Reinforcement Learning

Multiobjective Reinforcement Learning: A Comprehensive Overview

Introducing Belbic: Brain Emotional Learning Based Intelligent Controller

Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

References

Reinforcement Learning: An Introduction

Machine learning

Adaptive Filter Theory

Technical Note : \cal Q -Learning

Reinforcement learning: a survey

Related Papers (5)

Reinforcement Learning: An Introduction

Least-squares policy iteration

Learning to Predict by the Methods of Temporal Differences

Reinforcement learning: a survey

Actor-critic algorithms