Book Chapter10.1007/978-3-030-78811-7_34
Value-Based Continuous Control Without Concrete State-Action Value Function.
Jin Zhu,Haixian Zhang,Zhen Pan +2 more
- 17 Jul 2021
- pp 352-364
TL;DR: In this article, the actor-critic method is proposed to implement value-based continuous control in an effective but compromise way, where actions with higher expected return (state-action value, also as Q) will be selected as the action decision.
read more
Abstract: In the value-based reinforcement learning continuous control, it is apparent that actions with higher expected return (state-action value, also as Q) will be selected as the action decision. But limited by the expression of deep Q function, researchers mostly introduce an independent policy function for approximating the preference of Q function. These methods, named actor-critic, implement value-based continuous control in an effective but compromise way.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
•Book
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman
- 15 Apr 1994
TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
12.3K
Mastering the game of Go without human knowledge
David Silver,Julian Schrittwieser,Karen Simonyan,Ioannis Antonoglou,Aja Huang,Arthur Guez,Thomas Hubert,Lucas Baker,Matthew Lai,Adrian Bolton,Yutian Chen,Timothy P. Lillicrap,Fan Hui,Laurent Sifre,George van den Driessche,Thore Graepel,Demis Hassabis +16 more
TL;DR: An algorithm based solely on reinforcement learning is introduced, without human data, guidance or domain knowledge beyond game rules, that achieves superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.