Improving generalization for temporal difference learning: The successor representation

doi:10.1162/NECO.1993.5.4.613

Open AccessJournal Article10.1162/NECO.1993.5.4.613

Improving generalization for temporal difference learning: The successor representation

Peter Dayan

- 01 Jul 1993

- Neural Computation

- Vol. 5, Iss: 4, pp 613-624

861

TL;DR: This paper shows how TD machinery can be used to learn good function approximators or representations, and illustrates, using a navigation task, the appropriately distributed nature of the result.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/S0004-3702(99)00052-1

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Richard S. Sutton, +2 more

- 01 Aug 1999

- Artificial Intelligence

TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.

...read moreread less

3.9K

•Journal Article•10.1109/MSP.2017.2743240

Deep Reinforcement Learning: A Brief Survey

Kai Arulkumaran, +3 more

- 09 Nov 2017

- IEEE Signal Processing Magazine

TL;DR: Deep reinforcement learning (DRL) is poised to revolutionize the field of artificial intelligence (AI) and represents a step toward building autonomous systems with a higher-level understanding of the visual world as discussed by the authors.

...read moreread less

3.1K

•Journal Article•10.1109/MSP.2017.2743240

A brief survey of deep reinforcement learning

Kai Arulkumaran, +3 more

- 09 Nov 2017

- arXiv: Learning

TL;DR: This survey will cover central algorithms in deep RL, including the deep Q-network (DQN), trust region policy optimization (TRPO), and asynchronous advantage actor critic, and highlight the unique advantages of deep neural networks, focusing on visual understanding via RL.

...read moreread less

2.6K

•Posted Content

Reinforcement Learning with Unsupervised Auxiliary Tasks

Max Jaderberg, +6 more

- 16 Nov 2016

- arXiv: Learning

TL;DR: This paper significantly outperforms the previous state-of-the-art on Atari, averaging 880\% expert human performance, and a challenging suite of first-person, three-dimensional \emph{Labyrinth} tasks leading to a mean speedup in learning of 10$\times$ and averaging 87\% Expert human performance on Labyrinth.

...read moreread less

1.2K

•Journal Article•10.1016/J.NEURON.2013.09.007

Goals and habits in the brain.

Raymond J. Dolan, +1 more

- 16 Oct 2013

- Neuron

TL;DR: This work reviews four generations of work in this tradition of experimental work in cognitive neuroscience and provides pointers to the forefront of the field’s fifth generation.

...read moreread less

1K

...

Expand

References

•Journal Article•10.1007/BF00992698

Technical Note : \cal Q -Learning

Chris Watkins, +1 more

- 01 May 1992

- Machine Learning

TL;DR: This paper presents and proves in detail a convergence theorem forQ-learning based on that outlined in Watkins (1989), showing that Q-learning converges to the optimum action-values with probability 1 so long as all actions are repeatedly sampled in all states and the action- values are represented discretely.

...read moreread less

12K

Learning from delayed rewards

Chris Watkins

- 01 Jan 1989

5.9K

•Journal Article•10.1023/A:1022633531479

Learning to Predict by the Methods of Temporal Differences

Richard S. Sutton

- 01 Aug 1988

- Machine Learning

TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.

...read moreread less

5.2K

Journal Article•10.1016/0921-8890(95)00026-C

Learning from delayed rewards

Ben Kröse

- 01 Oct 1995

- Robotics and Autonomous Systems

TL;DR: The invention relates to a circuit for use in a receiver which can receive two-tone/stereo signals which is intended to make a choice between mono or stereo reproduction of signal A or of signal B and vice versa.

...read moreread less

3.9K

•Journal Article•10.1147/RD.33.0210

Some studies in machine learning using the game of checkers

SamuelA. L.

- 01 Jul 1959

- Ibm Journal of Research and Development

TL;DR: In this article, two machine learning procedures have been investigated in some detail using the game of checkers, and enough work has been done to verify the fact that a computer can be programmed so that it will lear...

...read moreread less

3.7K