Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model
Julian Schrittwieser,Ioannis Antonoglou,Thomas Hubert,Karen Simonyan,Laurent Sifre,Simon Schmitt,Arthur Guez,Edward Lockhart,Demis Hassabis,Thore Graepel,Timothy P. Lillicrap,David Silver +11 more
TL;DR: The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.
read more
Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Towards biologically plausible Dreaming and Planning in recurrent spiking networks
Cristiano Capone,Pierpaolo Paolucci +1 more
- 20 May 2022
TL;DR: The authors proposed a two-module (agent and model) spiking neural network in which "living new experiences in a model-based simulated environment" significantly boosts learning and also explore "planning", an online alternative to dreaming, that shows comparable performances.
E-MCTS: Deep Exploration in Model-Based Reinforcement Learning by Planning with Epistemic Uncertainty
Yaniv Oren,Matthijs T. J. Spaan,Wendelin Bohmer +2 more
- 21 Oct 2022
TL;DR: In this paper , the authors develop a methodology to propagate epistemic uncertainty in Monte-Carlo Tree Search (MCTS) and utilize the propagated uncertainty for a novel deep exploration algorithm by explicitly planning to explore.
Value-Based Continuous Control Without Concrete State-Action Value Function.
Jin Zhu,Haixian Zhang,Zhen Pan +2 more
- 17 Jul 2021
TL;DR: In this article, the actor-critic method is proposed to implement value-based continuous control in an effective but compromise way, where actions with higher expected return (state-action value, also as Q) will be selected as the action decision.
Proceedings Article
Expert Initialized Hybrid Model-Based and Model-Free Reinforcement Learning
Jeppe Langaa,Christoffer Sloth +1 more
- 13 Jun 2023
TL;DR: In this paper , a reinforcement learning algorithm that enables fast learning of control policies based on a limited amount of training data, by leveraging the attributes of both model-based and model-free algorithms, is presented.
Detection of disadvantageous individual decisions for a game with fantastic elements
TL;DR: In this paper , the authors proposed a method to automatically detect mistakes made in games based on artificial intelligence, which is going to help players gain more insight into how individual game events affect the outcome of the game and notify them about what they could have done differently to improve the probability of their team to win the game.
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Related Papers (5)
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988