A Comprehensive Survey of Multiagent Reinforcement Learning

doi:10.1109/TSMCC.2007.913919

Open AccessJournal Article10.1109/TSMCC.2007.913919

A Comprehensive Survey of Multiagent Reinforcement Learning

Lucian Busoniu, +2 more

- 01 Mar 2008

- Vol. 38, Iss: 2, pp 156-172

2.2K

TL;DR: The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.

Abstract: Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Fig. 4. (Left) An agent (◦) attempting to reach a goal (×) while avoiding capture by another agent (•). (Right) The Q-values of agent 1 for the state depicted to the left (Q2 = −Q1 ).

Fig. 1. Breakdown of MARL algorithms by the type of task they address.

TABLE I STABILITY AND ADAPTATION IN MARL

Fig. 2. MARL encompasses temporal-difference reinforcement learning, game theory, and direct policy search techniques.

Fig. 5. (Left) Two cleaning robots negotiating their assignment to different wings of a building. Both robots prefer to clean the smaller left wing. (Right) The Q-values of the two robots for the state depicted to the left.

TABLE II BREAKDOWN OF MARL ALGORITHMS BY TASK TYPE AND DEGREE OF AGENT AWARENESS

Citations

•Posted Content

Making sense of sensory input

Richard Evans, +5 more

- 05 Oct 2019

- arXiv: Artificial Intelligence

TL;DR: The Apperception Engine is a general-purpose system that was designed to make sense of any sensory sequence, and achieves human-level performance in sequence induction intelligence tests.

...read moreread less

36

•Journal Article•10.1016/J.ESWA.2010.12.152

Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy

Behrooz Masoumi, +1 more

- 01 Jul 2011

- Expert Systems With Applications

TL;DR: The concepts of stigmergy and entropy are imported into learning automata based multi-agent systems with the purpose of providing a simple framework for interaction and coordination in multi- agent systems and speeding up the learning process.

...read moreread less

36

•Journal Article•10.3389/FNCOM.2012.00087

Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats

Kevin Lloyd, +3 more

- 30 Oct 2012

- Frontiers in Computational Neuroscience

TL;DR: Both gating models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal, a property which highlights the multi-agent nature of such models.

...read moreread less

35

Journal Article•10.1021/IE4031743

Data-based Suboptimal Neuro-control Design with Reinforcement Learning for Dissipative Spatially Distributed Processes

Biao Luo, +4 more

- 01 May 2014

- Industrial & Engineering Chemistry Resea...

TL;DR: This paper considers the partially unknown spatially distributed processes (SDPs) which are described by general highly dissipative nonlinear partial differential equations (PDEs) and develops a data-based adaptive suboptimal neuro-control method by introducing the thought of reinforcement learning (RL).

...read moreread less

35

•Journal Article•10.3390/S20164546

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance.

Weiwei Zhao, +7 more

- 13 Aug 2020

- Sensors

TL;DR: The paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution, which enhances the collaboration and increases the sum of reward values obtained by the multiagent system.

...read moreread less

35

...

Expand

References

•Book

Reinforcement Learning: An Introduction

Richard S. Sutton, +1 more

- 01 Jan 1988

TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.

...read moreread less

39.7K

•Book

Dynamic Programming and Optimal Control

Dimitri P. Bertsekas

- 01 May 1995

TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.

...read moreread less

12.9K

•Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

- 15 Apr 1994

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

12.3K

•Journal Article•10.1007/BF00992698

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992

- Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

12K

•Monograph•10.1002/9780470316887

Markov Decision Processes

P. Whittle, +1 more

- 15 Apr 1994

- Journal of The Royal Statistical Society...

TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.

...read moreread less

11K