Probabilistic recursive reasoning for multi-agent reinforcement learning

Open AccessProceedings Article

Probabilistic recursive reasoning for multi-agent reinforcement learning

- 09 May 2019

133

TL;DR: In this paper, a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning is introduced, where each agent can reason about how the opponents would react to its future behaviors.

Abstract: Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/TAC.2020.3000819

Adaptive Event-Triggered Consensus of Multiagent Systems on Directed Graphs

Xianwei Li, +3 more

- 01 Apr 2021

- IEEE Transactions on Automatic Control

TL;DR: This article presents innovative adaptive event-triggered state-feedback protocols with novel composite event-triggering conditions that are applicable for linear MASs on general directed graphs, and the time-dependent term in the event- triggers is allowed to be a class of positive $L_1$ functions.

...read moreread less

301

•Posted Content

An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective

Yaodong Yang, +1 more

- 01 Nov 2020

- arXiv: Multiagent Systems

TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.

...read moreread less

233

•Posted Content

SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving.

Ming Zhou, +36 more

- 19 Oct 2020

- arXiv: Multiagent Systems

TL;DR: The design goals of SMARTS (Scalable Multi-Agent RL Training School) are described, its basic architecture and its key features are explained, and its use is illustrated through concrete multi-agent experiments on interactive scenarios.

...read moreread less

183

•Posted Content

RODE: Learning Roles to Decompose Multi-Agent Tasks

Tonghan Wang, +5 more

- 04 Oct 2020

- arXiv: Learning

TL;DR: This work proposes to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents by integrating information about action effects into the role policies to boost learning efficiency and policy generalization.

...read moreread less

172

•Posted Content

ROMA: Multi-Agent Reinforcement Learning with Emergent Roles

Tonghan Wang, +3 more

- 18 Mar 2020

- arXiv: Multiagent Systems

TL;DR: Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.

...read moreread less

129

...

Expand

References

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Journal Article•10.1017/S0140525X00076512

Does the chimpanzee have a theory of mind

David Premack, +1 more

- 01 Dec 1978

- Behavioral and Brain Sciences

TL;DR: This paper showed an adult chimpanzee a series of videotaped scenes of a human actor struggling with a variety of problems, some of which were simple, such as bananas vertically or horizontally out of reach, behind a box, and so forth; others were more complex, involving an actor unable to extricate himself from a locked cage, shivering because of a malfunctioning heater, or unable to play a phonograph because it was unplugged.

...read moreread less

8K

•Journal Article•10.1073/PNAS.36.1.48

Equilibrium points in n-person games

John F. Nash

- 01 Jan 1950

- Proceedings of the National Academy of S...

TL;DR: A concept of an n -person game in which each player has a finite set of pure strategies and in which a definite set of payments to the n players corresponds to each n -tuple ofpure strategies, one strategy being taken for each player.

...read moreread less

8K