Learning Parameterized Skills
Bruno da Silva,George Konidaris,Andrew G. Barto +2 more
- 26 Jun 2012
- pp 1443-1450
TL;DR: A method for constructing skills capable of solving tasks drawn from a distribution of parameterized reinforcement learning problems by predicting policy parameters from task parameters is introduced.
read more
Abstract: We introduce a method for constructing skills capable of solving tasks drawn from a distribution of parameterized reinforcement learning problems. The method draws example tasks from a distribution of interest and uses the corresponding learned policies to estimate the topology of the lower-dimensional piecewise-smooth manifold on which the skill policies lie. This manifold models how policy parameters change as task parameters vary. The method identifies the number of charts that compose the manifold and then applies non-linear regression in each chart to construct a parameterized skill by predicting policy parameters from task parameters. We evaluate our method on an underactuated simulated robotic arm tasked with learning to accurately throw darts at a parameterized target location.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
End-to-End Driving Via Conditional Imitation Learning
Felipe Codevilla,Matthias Miiller,Antonio M. López,Vladlen Koltun,Alexey Dosovitskiy +4 more
- 21 May 2018
TL;DR: This work evaluates different architectures for conditional imitation learning in vision-based driving and conducts experiments in realistic three-dimensional simulations of urban driving and on a 1/5 scale robotic truck that is trained to drive in a residential area.
1.3K
•Proceedings Article
Universal Value Function Approximators
Tom Schaul,Dan Horgan,Karol Gregor,David Silver +3 more
- 06 Jul 2015
TL;DR: An efficient technique for supervised learning of universal value function approximators (UVFAs) V (s, g; θ) that generalise not just over states s but also over goals g is developed and it is demonstrated that a UVFA can successfully generalise to previously unseen goals.
•Proceedings Article
Probabilistic Movement Primitives
Alexandros Paraschos,Christian Daniel,Jan Peters,Gerhard Neumann +3 more
- 05 Dec 2013
TL;DR: This work analytically derive a stochastic feedback controller which reproduces the given trajectory distribution for robot movement control and presents a probabilistic formulation of the MP concept that maintains a distribution over trajectories.
Active learning of inverse models with intrinsically motivated goal exploration in robots
TL;DR: The Self-Adaptive Goal Generation Robust Intelligent Adaptive Curiosity (SAGG-RIAC) architecture is introduced as an intrinsically motivated goal exploration mechanism which allows active learning of inverse models in high-dimensional redundant robots.
582
Preparing for the Unknown: Learning a Universal Policy with Online System Identification
Wenhao Yu,Jie Tan,C. Karen Liu,Greg Turk +3 more
- 12 Jul 2017
TL;DR: In this paper, the authors present a new method of learning control policies that successfully operate under unknown dynamic models by leveraging a large number of training examples that are generated using a physical simulator.
References
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
A global geometric framework for nonlinear dimensionality reduction.
TL;DR: An approach to solving dimensionality reduction problems that uses easily measured local metric information to learn the underlying global geometry of a data set and efficiently computes a globally optimal solution, and is guaranteed to converge asymptotically to the true structure.
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.
3.9K
•Journal Article
Large Margin Methods for Structured and Interdependent Output Variables
TL;DR: This paper proposes to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation and presents a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems.
Movement imitation with nonlinear dynamical systems in humanoid robots
Auke Jan Ijspeert,Jun Nakanishi,Stefan Schaal +2 more
- 07 Aug 2002
TL;DR: The results demonstrate that multi-joint human movements can be encoded successfully by the CPs, that a learned movement policy can readily be reused to produce robust trajectories towards different targets, and that the parameter space which encodes a policy is suitable for measuring to which extent two trajectories are qualitatively similar.