Open AccessProceedings Article
Probabilistic recursive reasoning for multi-agent reinforcement learning
Ying Wen,Yaodong Yang,Rui Luo,Jun Wang,Wei Pan +4 more
- 09 May 2019
TL;DR: In this paper, a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning is introduced, where each agent can reason about how the opponents would react to its future behaviors.
read more
Abstract: Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Adaptive Event-Triggered Consensus of Multiagent Systems on Directed Graphs
TL;DR: This article presents innovative adaptive event-triggered state-feedback protocols with novel composite event-triggering conditions that are applicable for linear MASs on general directed graphs, and the time-dependent term in the event- triggers is allowed to be a class of positive $L_1$ functions.
301
•Posted Content
An Overview of Multi-Agent Reinforcement Learning from Game Theoretical Perspective
Yaodong Yang,Jun Wang +1 more
TL;DR: This work provides a self-contained assessment of the current state-of-the-art MARL techniques from a game theoretical perspective and expects this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing domain and existing domain experts who want to obtain a panoramic view and identify new directions based on recent advances.
233
•Posted Content
SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving.
Ming Zhou,Jun Luo,Julian Villela,Yaodong Yang,David Rusu,Jiayu Miao,Weinan Zhang,Montgomery Alban,Iman Fadakar,Zheng Chen,Aurora Chongxi Huang,Ying Wen,Kimia Hassanzadeh,Daniel Graves,Dong Chen,Zhengbang Zhu,Nhat M. Nguyen,Mohamed A. Elsayed,Kun Shao,Sanjeevan Ahilan,Baokuan Zhang,Jiannan Wu,Zhengang Fu,Kasra Rezaee,Peyman Yadmellat,Mohsen Rohani,Nicolas Perez Nieves,Yihan Ni,Seyedershad Banijamali,Alexander Imani Cowen-Rivers,Zheng Tian,Daniel Palenicek,Haitham Bou-Ammar,Hongbo Zhang,Wulong Liu,Jianye Hao,Jun Wang +36 more
TL;DR: The design goals of SMARTS (Scalable Multi-Agent RL Training School) are described, its basic architecture and its key features are explained, and its use is illustrated through concrete multi-agent experiments on interactive scenarios.
•Posted Content
RODE: Learning Roles to Decompose Multi-Agent Tasks
TL;DR: This work proposes to first decompose joint action spaces into restricted role action spaces by clustering actions according to their effects on the environment and other agents by integrating information about action effects into the role policies to boost learning efficiency and policy generalization.
•Posted Content
ROMA: Multi-Agent Reinforcement Learning with Emergent Roles
TL;DR: Experiments show that the proposed role-oriented MARL framework (ROMA) can learn specialized, dynamic, and identifiable roles, which help the method push forward the state of the art on the StarCraft II micromanagement benchmark.
References
Generative Adversarial Nets
Ian Goodfellow,Jean Pouget-Abadie,Mehdi Mirza,Bing Xu,David Warde-Farley,Sherjil Ozair,Aaron Courville,Yoshua Bengio +7 more
- 08 Dec 2014
TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
Human-level control through deep reinforcement learning
Volodymyr Mnih,Koray Kavukcuoglu,David Silver,Andrei Rusu,Joel Veness,Marc G. Bellemare,Alex Graves,Martin Riedmiller,Andreas K. Fidjeland,Georg Ostrovski,Stig Petersen,Charles Beattie,Amir Sadik,Ioannis Antonoglou,Helen King,Dharshan Kumaran,Daan Wierstra,Shane Legg,Demis Hassabis +18 more
TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
Does the chimpanzee have a theory of mind
David Premack,Guy Woodruff +1 more
TL;DR: This paper showed an adult chimpanzee a series of videotaped scenes of a human actor struggling with a variety of problems, some of which were simple, such as bananas vertically or horizontally out of reach, behind a box, and so forth; others were more complex, involving an actor unable to extricate himself from a locked cage, shivering because of a malfunctioning heater, or unable to play a phonograph because it was unplugged.
Equilibrium points in n-person games
TL;DR: A concept of an n -person game in which each player has a finite set of pure strategies and in which a definite set of payments to the n players corresponds to each n -tuple ofpure strategies, one strategy being taken for each player.
Related Papers (5)
Oriol Vinyals,Igor Babuschkin,Wojciech Marian Czarnecki,Michael Mathieu,Andrew Dudzik,Junyoung Chung,David H. Choi,Richard E. Powell,Timo Ewalds,Petko Georgiev,Junhyuk Oh,Dan Horgan,Manuel Kroiss,Ivo Danihelka,Aja Huang,Laurent Sifre,Trevor Cai,John P. Agapiou,Max Jaderberg,Alexander Vezhnevets,Rémi Leblond,Tobias Pohlen,Valentin Dalibard,David Budden,Yury Sulsky,James Molloy,Tom Le Paine,Caglar Gulcehre,Ziyu Wang,Tobias Pfaff,Yuhuai Wu,Roman Ring,Dani Yogatama,Dario Wünsch,Katrina McKinney,Oliver Smith,Tom Schaul,Timothy P. Lillicrap,Koray Kavukcuoglu,Demis Hassabis,Chris Apps,David Silver +41 more