Efficient reinforcement learning using recursive least-squares methods
Xin Xu,Han-gen He,Dewen Hu +2 more
TL;DR: RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed and it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic.
read more
Abstract: The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(λ) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(λ) can be viewed as the extension of RLS-TD(0) from λ =0 to general 0≤ λ ≤1, so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(λ) are proved for ergodic Markov chains. Compared to the existing LS-TD(λ) algorithm, RLS-TD(λ) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(λ) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings.
The Fast-AHC algorithm is derived by applying the proposed RLS-TD(λ) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(λ). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(λ) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
•Book
Algorithms for Reinforcement Learning
Csaba Szepesvári
- 25 Jun 2010
TL;DR: This book focuses on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming, and gives a fairly comprehensive catalog of learning problems, and describes the core ideas, followed by the discussion of their theoretical properties and limitations.
Multiobjective Reinforcement Learning: A Comprehensive Overview
Chunming Liu,Xin Xu,Dewen Hu +2 more
- 01 Mar 2015
TL;DR: The basic architecture, research topics, and naïve solutions of MORL are introduced at first and several representative MORL approaches and some important directions of recent research are comprehensively reviewed.
408
Introducing Belbic: Brain Emotional Learning Based Intelligent Controller
TL;DR: This paper has adapted a computational model based on the limbic system in the mammalian brain for control engineering applications, and applied the proposed controller for some SISO, MIMO and nonlinear systems.
367
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning
Xin Xu,Dewen Hu,Xicheng Lu +2 more
TL;DR: The KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs) and can be applied to online learning control by incorporating an initial controller to ensure online performance.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Adaptive Filter Theory
TL;DR: A guide to using artificial intelligence in the filmmaking process, as well as practical suggestions for improving the quality and efficiency of existing and new approaches.
12.6K
Technical Note : \cal Q -Learning
Chris Watkins,Peter Dayan +1 more
TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Reinforcement learning: a survey
TL;DR: Central issues of reinforcement learning are discussed, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state.
Related Papers (5)
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
Vijay R. Konda,John N. Tsitsiklis +1 more
- 01 Jan 2002