Efficient solution algorithms for factored MDPs
TL;DR: This paper presents two approximate solution algorithms that exploit structure in factored MDPs by using an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables.
read more
Abstract: This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 1040 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Action understanding as inverse planning.
TL;DR: A computational framework based on Bayesian inverse planning for modeling human action understanding represents an intuitive theory of intentional agents' behavior based on the principle of rationality, and provides quantitative evidence for an approximately rational inference mechanism in human goal inference within the simplified stimulus paradigm.
1K
•Posted Content
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
TL;DR: The hierarchical-DQN framework as discussed by the authors integrates hierarchical value functions, operating at different temporal scales, with intrinsically motivated deep reinforcement learning, allowing for flexible goal specifications, such as functions over entities and relations.
997
Uncertain convex programs: randomized solutions and confidence levels
TL;DR: This paper considers an alternative ‘randomized’ or ‘scenario’ approach for dealing with uncertainty in optimization, based on constraint sampling, and studies the constrained optimization problem resulting by taking into account only a finite set of N constraints, chosen at random among the possible constraint instances of the uncertain problem.
The Linear Programming Approach to Approximate Dynamic Programming
TL;DR: In this article, an efficient method based on linear programming for approximating solutions to large-scale stochastic control problems is proposed. But the approach is not suitable for large scale queueing networks.
Probabilistic inference for solving discrete and continuous state Markov Decision Processes
Marc Toussaint,Amos Storkey +1 more
- 25 Jun 2006
TL;DR: An Expectation Maximization algorithm for computing optimal policies that actually optimizes the discounted expected future return for arbitrary reward functions and without assuming an ad hoc finite total time is presented.
References
•Book
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
Judea Pearl
- 01 Jan 1988
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
17.6K
•Book
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman
- 15 Apr 1994
TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
12.3K
Markov Decision Processes
P. Whittle,M. L. Puterman +1 more
TL;DR: Markov Decision Processes covers recent research advances in such areas as countable state space models with average reward criterion, constrained models, and models with risk sensitive optimality criteria, and explores several topics that have received little or no attention in other books.
•Book
Decisions with Multiple Objectives: Preferences and Value Trade-Offs
Ralph L. Keeney,Howard Raiffa,David W. Rajala +2 more
- 01 Jan 1976
TL;DR: In this article, a confused decision maker, who wishes to make a reasonable and responsible choice among alternatives, can systematically probe his true feelings in order to make those critically important, vexing trade-offs between incommensurable objectives.
Learning to Predict by the Methods of Temporal Differences
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Related Papers (5)
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
Thomas Dean,Keiji Kanazawa +1 more
- 01 Dec 1989