Planning with Expectation Models
Yi Wan,Muhammad Zaheer,Adam White,Martha White,Richard S. Sutton +4 more
- 01 Aug 2019
- pp 3649-3655
About: This article is published in International Joint Conference on Artificial Intelligence. The article was published on 01 Aug 2019. and is currently open access.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
When to use parametric models in reinforcement learning
TL;DR: It is hypothesised that, under suitable conditions, replay-based algorithms should be competitive to or better than model- based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free.
48
•Proceedings Article
Forethought and Hindsight in Credit Assignment
Veronica Chelu,Doina Precup,Hado van Hasselt +2 more
- 01 Jan 2020
TL;DR: The problem of credit assignment in reinforcement learning is addressed and fundamental questions regarding the way in which an agent can best use additional computation to propagate new information are explored, by planning with internal models of the world to improve its predictions.
21
Reinforcement Learning based Lane Change Decision-Making with Imaginary Sampling
Dong Li,Dongbin Zhao,Qichao Zhang +2 more
- 01 Dec 2019
TL;DR: The proposed two-stage control method includes a decision-making module computing the high-level lane change action and a lateral control module outputting the low-level steering angle which can improve the data efficiency and speed up the training process.
10
•Posted Content
Novelty Search in Representational Space for Sample Efficient Exploration.
TL;DR: In this paper, a low-dimensional encoding of the environment is learned with a combination of model-based and model-free objectives, and intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space are used to gauge novelty.
3
Reward-Respecting Subtasks for Model-Based Reinforcement Learning
07 Feb 2022
TL;DR: In this paper , the authors propose subtasks that use the original reward plus a bonus based on a feature of the state at the time the option stops, and show that options and option models obtained from such reward-respecting subtasks are much more likely to be useful in planning and can be learned online and off-policy using existing learning algorithms.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Neuro-Dynamic Programming.
Dimitri P. Bertsekas
- 01 Jan 2009
TL;DR: In this article, the authors present the first textbook that fully explains the neuro-dynamic programming/reinforcement learning methodology, which is a recent breakthrough in the practical application of neural networks and dynamic programming to complex problems of planning, optimal decision making, and intelligent control.
4.7K
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.
3.9K
•Proceedings Article
PILCO: A Model-Based and Data-Efficient Approach to Policy Search
Marc Peter Deisenroth,Carl Edward Rasmussen +1 more
- 28 Jun 2011
TL;DR: PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way by learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning.
Efficient selectivity and backup operators in Monte-Carlo tree search
Rémi Coulom
- 29 May 2006
TL;DR: A new framework to combine tree search with Monte-Carlo evaluation, that does not separate between a min-max phase and a Monte- carlo phase is presented, that provides finegrained control of the tree growth, at the level of individual simulations, and allows efficient selectivity.