Multi-Agent Task Assignment in the Bandit Framework
J. Le Ny,Munther A. Dahleh,Eric Feron +2 more
- 01 Jan 2006
- pp 5281-5286
TL;DR: A systematic method is presented, inspired from the work of Bertsimas and Nino-Mora on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs, which provides an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods.
read more
Abstract: We consider a task assignment problem for a fleet of UAVs in a surveillance/search mission We formulate the problem as a restless bandits problem with switching costs and discounted rewards: there are TV sites to inspect, each one of them evolving as a Markov chain, with different transition probabilities if the site is inspected or not The sites evolve independently of each other, there are transition costs c ij for moving between sites i and j isin {1,, N}, rewards when visiting the sites, and we maximize a mixed objective function of these costs and rewards This problem is known to be PSPACE-hard We present a systematic method, inspired from the work of Bertsimas and Nino-Mora (2000) on restless bandits, for deriving a linear programming relaxation for such locally decomposable MDPs The relaxation is computable in polynomial-time offline, provides a bound on the achievable performance, as well as an approximation of the cost-to-go which can be used online in conjunction with standard suboptimal stochastic control methods In particular, the one-step lookahead policy based on this approximate cost-to-go reduces to computing the optimal value of a linear assignment problem of size N We present numerical experiments, for which we assess the quality of the heuristics using the performance bound
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Surveillance in an abruptly changing world via multiarmed bandits
Vaibhav Srivastava,Paul Reverdy,Naomi Ehrich Leonard +2 more
- 01 Jan 2014
TL;DR: This work forms this problem as a multiarmed bandit (MAB) problem with Gaussian rewards and change points, and addresses the fundamental tradeoff between learning the true event (exploration), and collecting the data that is most evidential about the trueevent (exploitation).
58
Optimal Network Selection in Heterogeneous Wireless Multimedia Networks
Pengbo Si,F. R. Yu,Hong Ji,Victor C. M. Leung +3 more
- 14 Jun 2009
TL;DR: This paper forms the integrated network as a restless bandit system and proposes an optimal distributed network selection scheme in heterogeneous wireless networks considering multimedia application layer QoS, which can be applicable to both tight coupling and loose coupling scenarios in the integration of heterogeneity wireless networks.
57
A Hierarchical Identity Based Key Management Scheme in Tactical Mobile Ad Hoc Networks
F R Yu,H Tang,P C Mason,Fei Wang +3 more
TL;DR: This paper proposes a distributed hierarchical key management scheme in which nodes can get their keys updated either from their parent nodes or a threshold of sibling nodes, and the dynamic node selection process is formulated as a stochastic problem.
57
Multi-armed bandit formulation for autonomous mobile acoustic relay adaptive positioning
Mei Yi Cheung,Joshua Leighton,Franz S. Hover +2 more
- 06 May 2013
TL;DR: Results from shallow-water field experiments conducted with autonomous surface vehicles and acoustic modems transmitting data through a one-way, two-hop network in the Charles River Basin, Boston are presented.
18
Distributed Multisource Transmission in Wireless Mobile Peer-to-Peer Networks: A Restless-Bandit Approach
TL;DR: A distributed multisource sender-selection scheme to maximize the receiving data rate and minimize the energy consumption and an indexability property that dramatically simplifies the computation and implementation of the policy is proposed.
14
References
Combinatorial optimization. Polyhedra and efficiency.
Alexander Schrijver
- 01 Jan 2003
TL;DR: This book shows the combinatorial optimization polyhedra and efficiency as your friend in spending the time in reading a book.
4.5K
•Book
Constrained Markov Decision Processes
Eitan Altman
- 30 Mar 1999
TL;DR: In this paper, a unified approach for the study of constrained Markov decision processes with a countable state space and unbounded costs is presented, where a single controller has several objectives; it is desirable to design a controller that minimize one of cost objectives, subject to inequality constraints on other cost objectives.
Restless bandits: activity allocation in a changing world
TL;DR: In this article, the Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static, and arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal.
1.3K
The Linear Programming Approach to Approximate Dynamic Programming
TL;DR: In this article, an efficient method based on linear programming for approximating solutions to large-scale stochastic control problems is proposed. But the approach is not suitable for large scale queueing networks.