Solving large-scale multi-agent tasks via transfer learning with dynamic state representation

doi:10.1177/17298806231162440

Open AccessJournal Article10.1177/17298806231162440

Solving large-scale multi-agent tasks via transfer learning with dynamic state representation

Lintao Dou, +2 more

- 01 Mar 2023

- International Journal of Advanced Roboti...

- Vol. 20, Iss: 2, pp 172988062311624-172988062311624

1

TL;DR: In this paper , a self-attention mechanism is used to capture and represent the local observations of agents in a dynamic state representation network to enable the training of large-scale agents to use the knowledge accumulated by training small-scale ones.

Abstract: Many research results have emerged in the past decade regarding multi-agent reinforcement learning. These include the successful application of asynchronous advantage actor-critic, double deep Q-network and other algorithms in multi-agent environments, and the more representative multi-agent training method based on the classical centralized training distributed execution algorithm QMIX. However, in a large-scale multi-agent environment, training becomes a major challenge due to the exponential growth of the state-action space. In this article, we design a training scheme from small-scale multi-agent training to large-scale multi-agent training. We use the transfer learning method to enable the training of large-scale agents to use the knowledge accumulated by training small-scale agents. We achieve policy transfer between tasks with different numbers of agents by designing a new dynamic state representation network, which uses a self-attention mechanism to capture and represent the local observations of agents. The dynamic state representation network makes it possible to expand the policy model from a few agents (4 agents, 10 agents) task to large-scale agents (16 agents, 50 agents) task. Furthermore, we conducted experiments in the famous real-time strategy game Starcraft II and the multi-agent research platform MAgent. And also set unmanned aerial vehicles trajectory planning simulations. Experimental results show that our approach not only reduces the time consumption of a large number of agent training tasks but also improves the final training performance.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1007/s10462-025-11395-4

Intelligent multi-robot exploration in non-exposed spaces: methods and challenges

Liuchun Li, +15 more

- 24 Oct 2025

- Artificial Intelligence Review

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Proceedings Article•10.1109/CVPR.2018.00813

Non-local Neural Networks

Xiaolong Wang, +3 more

- 18 Jun 2018

TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.

...read moreread less

12.6K

•Proceedings Article

Asynchronous methods for deep reinforcement learning

Volodymyr Mnih, +7 more

- 19 Jun 2016

TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.

...read moreread less

9.2K

Deep reinforcement learning with double Q-learning

H Van Hasselt, +2 more

- 01 Jan 2015

TL;DR: In this article, the authors show that the DQN algorithm suffers from substantial overestimation in some games in the Atari 2600 domain, and they propose a specific adaptation to the algorithm and show that this algorithm not only reduces the observed overestimations, but also leads to much better performance on several games.

...read moreread less

7.9K