A Comprehensive Survey of Multiagent Reinforcement Learning
Lucian Busoniu,Robert Babuska,B. De Schutter +2 more
- 01 Mar 2008
- Vol. 38, Iss: 2, pp 156-172
TL;DR: The benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied, and an outlook for the field is provided.
read more
Abstract: Multiagent systems are rapidly finding applications in a variety of domains, including robotics, distributed control, telecommunications, and economics. The complexity of many tasks arising in these domains makes them difficult to solve with preprogrammed agent behaviors. The agents must, instead, discover a solution on their own, using learning. A significant part of the research on multiagent learning concerns reinforcement learning techniques. This paper provides a comprehensive survey of multiagent reinforcement learning (MARL). A central issue in the field is the formal statement of the multiagent learning goal. Different viewpoints on this issue have led to the proposal of many different goals, among which two focal points can be distinguished: stability of the agents' learning dynamics, and adaptation to the changing behavior of the other agents. The MARL algorithms described in the literature aim---either explicitly or implicitly---at one of these two goals or at a combination of both, in a fully cooperative, fully competitive, or more general setting. A representative selection of these algorithms is discussed in detail in this paper, together with the specific issues that arise in each category. Additionally, the benefits and challenges of MARL are described along with some of the problem domains where the MARL techniques have been applied. Finally, an outlook for the field is provided.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 4. (Left) An agent (◦) attempting to reach a goal (×) while avoiding capture by another agent (•). (Right) The Q-values of agent 1 for the state depicted to the left (Q2 = −Q1 ). 
Fig. 1. Breakdown of MARL algorithms by the type of task they address. 
TABLE I STABILITY AND ADAPTATION IN MARL 
Fig. 2. MARL encompasses temporal-difference reinforcement learning, game theory, and direct policy search techniques. 
Fig. 5. (Left) Two cleaning robots negotiating their assignment to different wings of a building. Both robots prefer to clean the smaller left wing. (Right) The Q-values of the two robots for the state depicted to the left. 
TABLE II BREAKDOWN OF MARL ALGORITHMS BY TASK TYPE AND DEGREE OF AGENT AWARENESS
Citations
•Posted Content
Making sense of sensory input
Richard Evans,José Hernández-Orallo,José Hernández-Orallo,Johannes Welbl,Pushmeet Kohli,Marek Sergot +5 more
TL;DR: The Apperception Engine is a general-purpose system that was designed to make sense of any sensory sequence, and achieves human-level performance in sequence induction intelligence tests.
Speeding up learning automata based multi agent systems using the concepts of stigmergy and entropy
TL;DR: The concepts of stigmergy and entropy are imported into learning automata based multi-agent systems with the purpose of providing a simple framework for interaction and coordination in multi- agent systems and speeding up the learning process.
36
Learning to use working memory: a reinforcement learning gating model of rule acquisition in rats
TL;DR: Both gating models produced rule-acquisition behavior consistent with the experimental data, though only the SARSA gating model mirrored faster learning following rule reversal, a property which highlights the multi-agent nature of such models.
Data-based Suboptimal Neuro-control Design with Reinforcement Learning for Dissipative Spatially Distributed Processes
TL;DR: This paper considers the partially unknown spatially distributed processes (SDPs) which are described by general highly dissipative nonlinear partial differential equations (PDEs) and develops a data-based adaptive suboptimal neuro-control method by introducing the thought of reinforcement learning (RL).
35
Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance.
Weiwei Zhao,Hairong Chu,Xikui Miao,Lihong Guo,Honghai Shen,Chenhao Zhu,Feng Zhang,Dongxin Liang +7 more
TL;DR: The paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution, which enhances the collaboration and increases the sum of reward values obtained by the multiagent system.
35
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Book
Dynamic Programming and Optimal Control
Dimitri P. Bertsekas
- 01 May 1995
TL;DR: The leading and most up-to-date textbook on the far-ranging algorithmic methododogy of Dynamic Programming, which can be used for optimal control, Markovian decision problems, planning and sequential decision making under uncertainty, and discrete/combinatorial optimization.
•Book
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman
- 15 Apr 1994
TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
12.3K
Technical Note : \cal Q -Learning
Chris Watkins,Peter Dayan +1 more
TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.
Markov Decision Processes
P. Whittle,M. L. Puterman +1 more
TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.
Related Papers (5)
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Mar 1998
Chris Watkins,Peter Dayan +1 more