Multi-Agent Task Assignment in the Bandit Framework

doi:10.1109/CDC.2006.377612

Open AccessProceedings Article10.1109/CDC.2006.377612

Multi-Agent Task Assignment in the Bandit Framework

J. Le Ny, +2 more

- 01 Jan 2006

- pp 5281-5286

35

TL;DR: A systematic method is presented, inspired from the work of Bertsimas and Nino-Mora on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs, which provides an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods.

Abstract: We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are TV sites to inspect, each one of them evolving as a Markov chain, with different transition probabilities if the site is inspected or not The sites evolve independently of each other, there are transition costs c ij for moving between sites i and j isin {1,, N}, rewards when visiting the sites, and we maximize a mixed objective function of these costs and rewards This problem is known to be PSPACE-hard We present a systematic method, inspired from the work of Bertsimas and Nino-Mora (2000) on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs The relaxation is computable in polynomial-time offline, provides a bound on the achievable performance, as well as an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods In particular, the one-step lookahead policy based on this approximate cost-to-go reduces to computing the optimal value of a linear assignment problem of size N We present numerical experiments, for which we assess the quality of the heuristics using the performance bound

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/CDC.2014.7039462

Surveillance in an abruptly changing world via multiarmed bandits

Vaibhav Srivastava, +2 more

- 01 Jan 2014

TL;DR: This work forms this problem as a multiarmed bandit (MAB) problem with Gaussian rewards and change points, and addresses the fundamental tradeoff between learning the true event (exploration), and collecting the data that is most evidential about the trueevent (exploitation).

...read moreread less

58

Proceedings Article•10.1109/ICC.2009.5199281

Optimal Network Selection in Heterogeneous Wireless Multimedia Networks

Pengbo Si, +3 more

- 14 Jun 2009

TL;DR: This paper forms the integrated network as a restless bandit system and proposes an optimal distributed network selection scheme in heterogeneous wireless networks considering multimedia application layer QoS, which can be applicable to both tight coupling and loose coupling scenarios in the integration of heterogeneity wireless networks.

...read moreread less

57

Journal Article•10.1109/TNSM.2010.1012.0362

A Hierarchical Identity Based Key Management Scheme in Tactical Mobile Ad Hoc Networks

F R Yu, +3 more

- 01 Dec 2010

- IEEE Transactions on Network and Service...

TL;DR: This paper proposes a distributed hierarchical key management scheme in which nodes can get their keys updated either from their parent nodes or a threshold of sibling nodes, and the dynamic node selection process is formulated as a stochastic problem.

...read moreread less

57

Proceedings Article•10.1109/ICRA.2013.6631165

Multi-armed bandit formulation for autonomous mobile acoustic relay adaptive positioning

Mei Yi Cheung, +2 more

- 06 May 2013

TL;DR: Results from shallow-water field experiments conducted with autonomous surface vehicles and acoustic modems transmitting data through a one-way, two-hop network in the Charles River Basin, Boston are presented.

...read moreread less

18

Journal Article•10.1109/TVT.2009.2031652

Distributed Multisource Transmission in Wireless Mobile Peer-to-Peer Networks: A Restless-Bandit Approach

Pengbo Si, +3 more

- 01 Jan 2010

- IEEE Transactions on Vehicular Technolog...

TL;DR: A distributed multisource sender-selection scheme to maximize the receiving data rate and minimize the energy consumption and an indexability property that dramatically simplifies the computation and implementation of the policy is proposed.

...read moreread less

14

...

Expand

References

Combinatorial optimization. Polyhedra and efficiency.

Alexander Schrijver

- 01 Jan 2003

TL;DR: This book shows the combinatorial optimization polyhedra and efficiency as your friend in spending the time in reading a book.

...read moreread less

4.5K

•Book

Constrained Markov Decision Processes

Eitan Altman

- 30 Mar 1999

TL;DR: In this paper, a unified approach for the study of constrained Markov decision processes with a countable state space and unbounded costs is presented, where a single controller has several objectives; it is desirable to design a controller that minimize one of cost objectives, subject to inequality constraints on other cost objectives.

...read moreread less

1.9K

Journal Article•10.2307/3214163

Restless bandits: activity allocation in a changing world

Peter Whittle

- 01 Jan 1988

- Journal of Applied Probability

TL;DR: In this article, the Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static, and arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal.

...read moreread less

1.3K

•Journal Article•10.1287/OPRE.51.6.850.24925

The Linear Programming Approach to Approximate Dynamic Programming

Daniela Pucci de Farias, +1 more

- 01 Nov 2003

- Operations Research

TL;DR: In this article, an efficient method based on linear programming for approximating solutions to large-scale stochastic control problems is proposed. But the approach is not suitable for large scale queueing networks.

...read moreread less

735

Book•10.1002/9780470980033

Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices

John Gittins, +2 more

- 18 Mar 2011

655

...

Expand

Multi-Agent Task Assignment in the Bandit Framework

Chat with Paper

AI Agents for this Paper

Citations

Surveillance in an abruptly changing world via multiarmed bandits

Optimal Network Selection in Heterogeneous Wireless Multimedia Networks

A Hierarchical Identity Based Key Management Scheme in Tactical Mobile Ad Hoc Networks

Multi-armed bandit formulation for autonomous mobile acoustic relay adaptive positioning

Distributed Multisource Transmission in Wireless Mobile Peer-to-Peer Networks: A Restless-Bandit Approach

References

Combinatorial optimization. Polyhedra and efficiency.

Constrained Markov Decision Processes

Restless bandits: activity allocation in a changing world

The Linear Programming Approach to Approximate Dynamic Programming

Multi-Armed Bandit Allocation Indices: Gittins/Multi-Armed Bandit Allocation Indices

Related Papers (5)

Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic

Restless bandits: activity allocation in a changing world

Restless bandits with switching costs: linear programming relaxations, performance bounds and limited lookahead policies

Some aspects of the sequential design of experiments

Finite-time Analysis of the Multiarmed Bandit Problem