Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

doi:10.1038/S41586-020-03051-4

Open AccessJournal Article10.1038/S41586-020-03051-4

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Julian Schrittwieser, +11 more

- 19 Nov 2019

- arXiv: Learning

1.4K

TL;DR: The MuZero algorithm is presented, which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics.

Abstract: Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arXiv.2305.04432

Goal-oriented inference of environment from redundant observations

Kazuki Takahashi, +3 more

- 08 May 2023

- arXiv.org

TL;DR: In this article , the authors propose a method for goal-oriented reinforcement learning to efficiently learn state transition rules among reward-related"core states'' from redundant observations, starting with a small number of initial core states, gradually adding new core states to the transition diagram until it achieves an optimal behavioral strategy consistent with the Bellman equation.

...read moreread less

Learning to Initiate and Reason in Event-Driven Cascading Processes

Yuval Atzmon, +2 more

TL;DR: Cascade as discussed by the authors is a supervised learning setup where an agent observes a system with known dynamics evolving from some initial state, and needs to make an intervention that triggers a cascade of events, such that the system reaches an alternative (counterfactual) behavior.

...read moreread less

Belief state modeling

Samuel Sokota, +2 more

TL;DR: An inference-time improvement framework for parametric sequential generative modeling methods called belief fine-tuning (BFT), which enables, for the first time, approximate public belief state search in imperfect-information games where the number of possible information states is too large to track tabularly.

...read moreread less

•Posted Content

Planning with Expectation Models for Control.

Katya Kudashkina, +3 more

- 17 Apr 2021

- arXiv: Artificial Intelligence

TL;DR: This paper showed that planning with an expectation model must update a state value function, not an action-value function as previously suggested (e.g., Sorg & Singh, 2010).

...read moreread less

Journal Article•10.1109/icasi60819.2024.10547962

Incorporating Domain Knowledge Into Monte Carlo Tree Search in Dark Chess

Shi-Jim Yen, +3 more

- 17 Apr 2024

TL;DR: MCTS-based Dark Chess program incorporates domain knowledge to improve node scoring, leading to a stronger program.

...read moreread less

...

Expand

References

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

...

Expand

Related Papers (5)

Reinforcement Learning: An Introduction

[...]

Richard S. Sutton, +1 more

- 01 Jan 1988

Playing Atari with Deep Reinforcement Learning

[...]

Volodymyr Mnih, +6 more

- 19 Dec 2013

- arXiv: Learning

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Chat with Paper

AI Agents for this Paper

Citations

Goal-oriented inference of environment from redundant observations

Learning to Initiate and Reason in Event-Driven Cascading Processes

Belief state modeling

Planning with Expectation Models for Control.

Incorporating Domain Knowledge Into Monte Carlo Tree Search in Dark Chess

References

ImageNet classification with deep convolutional neural networks

ImageNet Classification with Deep Convolutional Neural Networks

Reinforcement Learning: An Introduction

Human-level control through deep reinforcement learning

Mastering the game of Go with deep neural networks and tree search

Related Papers (5)

Mastering the game of Go without human knowledge

Human-level control through deep reinforcement learning

Mastering the game of Go with deep neural networks and tree search

Reinforcement Learning: An Introduction

Playing Atari with Deep Reinforcement Learning