Efficient solution algorithms for factored MDPs

doi:10.1613/JAIR.1000

Open AccessJournal Article10.1613/JAIR.1000

Efficient solution algorithms for factored MDPs

Carlos Guestrin, +3 more

- 01 Jul 2003

- Journal of Artificial Intelligence Resea...

- Vol. 19, Iss: 1, pp 399-468

536

TL;DR: This paper presents two approximate solution algorithms that exploit structure in factored MDPs by using an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables.

Abstract: This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 1040 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.COGNITION.2009.07.005

Action understanding as inverse planning.

Chris L. Baker, +2 more

- 01 Dec 2009

- Cognition

TL;DR: A computational framework based on Bayesian inverse planning for modeling human action understanding represents an intuitive theory of intentional agents' behavior based on the principle of rationality, and provides quantitative evidence for an approximately rational inference mechanism in human goal inference within the simplified stimulus paradigm.

...read moreread less

1K

•Posted Content

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Tejas Kulkarni, +3 more

- 20 Apr 2016

- arXiv: Learning

TL;DR: The hierarchical-DQN framework as discussed by the authors integrates hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, allowing for flexible goal specifications, such as functions over entities and relations.

...read moreread less

997

•Journal Article•10.1007/S10107-003-0499-Y

Uncertain convex programs: randomized solutions and confidence levels

Giuseppe Carlo Calafiore, +1 more

- 01 Jan 2005

- Mathematical Programming

TL;DR: This paper considers an alternative ‘randomized’ or ‘scenario’ approach for dealing with uncertainty in optimization, based on constraint sampling, and studies the constrained optimization problem resulting by taking into account only a finite set of N constraints, chosen at random among the possible constraint instances of the uncertain problem.

...read moreread less

846

•Journal Article•10.1287/OPRE.51.6.850.24925

The Linear Programming Approach to Approximate Dynamic Programming

Daniela Pucci de Farias, +1 more

- 01 Nov 2003

- Operations Research

TL;DR: In this article, an efficient method based on linear programming for approximating solutions to large-scale stochastic control problems is proposed. But the approach is not suitable for large scale queueing networks.

...read moreread less

735

Proceedings Article•10.1145/1143844.1143963

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

Marc Toussaint, +1 more

- 25 Jun 2006

TL;DR: An Expectation Maximization algorithm for computing optimal policies that actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time is presented.

...read moreread less

626

...

Expand

References

•Book

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Judea Pearl

- 01 Jan 1988

TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.

...read moreread less

17.6K

•Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman

- 15 Apr 1994

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.

...read moreread less

12.3K

•Monograph•10.1002/9780470316887

Markov Decision Processes

P. Whittle, +1 more

- 15 Apr 1994

- Journal of The Royal Statistical Society...

TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.

...read moreread less

11K

•Book

Decisions with Multiple Objectives: Preferences and Value Trade-Offs

Ralph L. Keeney, +2 more

- 01 Jan 1976

TL;DR: In this article, a confused decision maker, who wishes to make a reasonable and responsible choice among alternatives, can systematically probe his true feelings in order to make those critically important, vexing trade-offs between incommensurable objectives.

...read moreread less

9.2K

•Journal Article•10.1023/A:1022633531479

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988

- Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less

5.2K

...

Expand

Efficient solution algorithms for factored MDPs

Chat with Paper

AI Agents for this Paper

Citations

Action understanding as inverse planning.

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Uncertain convex programs: randomized solutions and confidence levels

The Linear Programming Approach to Approximate Dynamic Programming

Probabilistic inference for solving discrete and continuous state Markov Decision Processes

References

Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes

Decisions with Multiple Objectives: Preferences and Value Trade-Offs

Learning to Predict by the Methods of Temporal Differences

Related Papers (5)

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Decision-theoretic planning: structural assumptions and computational leverage

Reinforcement Learning: An Introduction

A model for reasoning about persistence and causation

Planning and Acting in Partially Observable Stochastic Domains