Journal Article10.48550/arXiv.2304.12966
Towards Theoretical Understanding of Inverse Reinforcement Learning
6
TL;DR: Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent as mentioned in this paper , which is the case of finite-horizon problems with a generative model.
read more
Abstract: Inverse reinforcement learning (IRL) denotes a powerful family of algorithms for recovering a reward function justifying the behavior demonstrated by an expert agent. A well-known limitation of IRL is the ambiguity in the choice of the reward function, due to the existence of multiple rewards that explain the observed behavior. This limitation has been recently circumvented by formulating IRL as the problem of estimating the feasible reward set, i.e., the region of the rewards compatible with the expert's behavior. In this paper, we make a step towards closing the theory gap of IRL in the case of finite-horizon problems with a generative model. We start by formally introducing the problem of estimating the feasible reward set, the corresponding PAC requirement, and discussing the properties of particular classes of rewards. Then, we provide the first minimax lower bound on the sample complexity for the problem of estimating the feasible reward set of order ${\Omega}\Bigl( \frac{H^3SA}{\epsilon^2} \bigl( \log \bigl(\frac{1}{\delta}\bigl) + S \bigl)\Bigl)$, being $S$ and $A$ the number of states and actions respectively, $H$ the horizon, $\epsilon$ the desired accuracy, and $\delta$ the confidence. We analyze the sample complexity of a uniform sampling strategy (US-IRL), proving a matching upper bound up to logarithmic factors. Finally, we outline several open questions in IRL and propose future research directions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Provably Efficient Exploration in Inverse Constrained Reinforcement Learning
Bo Yue,Jian Li,Guiliang Liu +2 more
- 24 Sep 2024
TL;DR: This paper introduces a strategic exploration framework for Inverse Constrained Reinforcement Learning, proposing two algorithms with guaranteed efficiency and tractable sample complexity, to recover optimal constraints from expert demonstrations in complex environments.
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
Kihyun Kım,Jiawei Zhang,Pablo A. Parrilo,Asuman Ozdaglar +3 more
- 20 May 2024
TL;DR: A novel LP framework for offline reward learning from human demonstrations and feedback, estimating a feasible reward set and offering optimality guarantees.
Randomized algorithms and PAC bounds for inverse reinforcement learning in continuous spaces
Angeliki Kamoutsi,Peter Schmitt-Förster,Tobias Sutter,Volkan Cevher,John Lygeros +4 more
- 24 May 2024
TL;DR: The paper studies inverse reinforcement learning in continuous spaces, focusing on characterizing solutions and deriving optimal solutions using randomized algorithms.
Inverse Reinforcement Learning with the Average Reward Criterion
Feiyang Wu,Anqi Wu +1 more
TL;DR: Inverse Reinforcement Learning (IRL) with an average-reward criterion was studied in this article , where stochastic first-order methods were developed to solve the IRL problem under the average reward setting, which requires solving an average reward Markov decision process (AMDP) as a subproblem.
How to Scale Inverse RL to Large State Spaces? A Provably Efficient Approach
Filippo Lazzati,Mirco Mutti,Alberto Maria Metelli +2 more
- 06 Jun 2024
TL;DR: This paper proposes CATY-IRL, a sample-efficient online Inverse Reinforcement Learning algorithm for Linear Markov Decision Processes, achieving minimax optimality up to logarithmic factors, and improving the state-of-the-art lower bound for Reward-Free Exploration.
References
Apprenticeship learning via inverse reinforcement learning
Pieter Abbeel,Andrew Y. Ng +1 more
- 04 Jul 2004
TL;DR: This work thinks of the expert as trying to maximize a reward function that is expressible as a linear combination of known features, and gives an algorithm for learning the task demonstrated by the expert, based on using "inverse reinforcement learning" to try to recover the unknown reward function.
3.9K
•Proceedings Article
Maximum entropy inverse reinforcement learning
Brian D. Ziebart,Andrew L. Maas,J. Andrew Bagnell,Anind K. Dey +3 more
- 13 Jul 2008
TL;DR: A probabilistic approach based on the principle of maximum entropy that provides a well-defined, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods is developed.
Algorithms for Inverse Reinforcement Learning
Andrew Y. Ng,Stuart Russell +1 more
- 29 Jun 2000
TL;DR: Pharmacokinetics of ivermectin after IV administration were best described by a 2-compartment open model; values for main compartmental variables included volume of distribution at a steady state, area under the plasma concentration-time curve, and area under the AUC curve.
1K
Maximum margin planning
Nathan Ratliff,J. Andrew Bagnell,Martin Zinkevich +2 more
- 25 Jun 2006
TL;DR: This work learns mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior, and demonstrates a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference.
•Book
An Algorithmic Perspective on Imitation Learning
Takayuki Osa,Joni Pajarinen,Gerhard Neumann,J. Andrew Bagnell,Pieter Abbeel,Jan Peters +5 more
- 27 Mar 2018
TL;DR: Imitation learning as discussed by the authors is a generalization of reinforcement learning, where a teacher can demonstrate a desired behavior rather than attempting to manually engineer it, which is referred to as imitation learning.