TL;DR: It is shown that the P300, a stimulus-locked electrophysiological response previously associated with adjustments in learning behavior, does so conditionally on the source of surprise, and provides a surprise signal that is interpreted by downstream learning processes differentially according to statistical context.
Abstract: Learning should be adjusted according to the surprise associated with observed outcomes but calibrated according to statistical context. For example, when occasional changepoints are expected, surprising outcomes should be weighted heavily to speed learning. In contrast, when uninformative outliers are expected to occur occasionally, surprising outcomes should be less influential. Here we dissociate surprising outcomes from the degree to which they demand learning using a predictive inference task and computational modeling. We show that the P300, a stimulus-locked electrophysiological response previously associated with adjustments in learning behavior, does so conditionally on the source of surprise. Larger P300 signals predicted greater learning in a changing context, but less learning in a context where surprise was indicative of a one-off outlier (oddball). Our results suggest that the P300 provides a surprise signal that is interpreted by downstream learning processes differentially according to statistical context in order to appropriately calibrate learning across complex environments.
TL;DR: This paper describes the learning agents and their performance, and summarizes the learning algorithms and the lessons I learned from this study.
Abstract: The purpose of this work is to investigate and evaluate different reinforcement learning frameworks using connectionist networks. I study four frameworks, which are adopted from the ideas developed by Rich Sutton and his colleagues. The four frameworks are based on two learning procedures: the Temporal Difference methods for solving the credit assignment problem, and the backpropagation algorithm for developing appropriate internal representations. Two of them also involve learning a world model and using it to speed learning. To evaluate their performance, I design a dynamic environment and implement different learning agents, using the different frameworks, to survive in it. The environment is nontrivial and nondeterministic. Surprisingly, all of the agents can learn to survive fairly well in a reasonable time frame. This paper describes the learning agents and their performance, and summarizes the learning algorithms and the lessons I learned from this study. This research was supported by NASA under Contract NAGW-1175. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of NASA.
TL;DR: Two techniques are described, minimal ascension and metamapping, that enable analogies to be drawn even when comparing descriptions using different relational vocabularies, and evidence for the effectiveness of these techniques is provided by a large-scale external evaluation.
Abstract: We report on a series of transfer learning experiments in game domains, in which we use structural analogy from one learned game to speed learning of another related game. We find that a major benefit of analogy is that it reduces the extent to which the source domain must be generalized before transfer. We describe two techniques in particular, minimal ascension and metamapping, that enable analogies to be drawn even when comparing descriptions using different relational vocabularies. Evidence for the effectiveness of these techniques is provided by a large-scale external evaluation, involving a substantial number of novel distant analogs.
TL;DR: This paper introduces a formal model of a learning problem as a distribution of Markov Decision Problems (MDPs), and finds that temporally extended options can achieve tradeoffs of learning speed versus asymptotic perf ormance for both algorithms.
Abstract: One of the original motivations for the use of temporally extended actions, or options, in reinforcement learning was to enable the transfer of learned value functions or policies to new problems. Many experimenters have used options to speed learning on single problems, but options have not been studied in depth as a tool for transfer. In this paper we introduce a formal model of a learning problem as a distribution of Markov Decision Problems (MDPs). Each MDP represents a task the agent will have to solve. Our model can also be viewed as a partially observable Markov decision problem (POMDP), with a special structure that we describe. We study two learning algorithms, one which keeps a single value function that generalizes across tasks, and an incremental POMDP-inspired method maintaining separate value functions for each task. We evaluate the learning algorithms on an extension of the Mountain Car domain, in terms of both learning speed and asymptotic performance. Empirically, we find that temporally extended options can facilitate transfer for both algorithms. In our domain, the single value function algorithm has much better learning speed because it generalizes its experience more broadly across tasks. We also observe that different sets of options can achieve tradeoffs of learning speed versus asymptotic performance.
TL;DR: The results indicate that searchers with higher levels of perceptual speed will learn additional search vocabulary, and use that vocabulary to complete higher quality searches, when they use a system designed to optimize scanning of subject descriptors, which supports the idea that cognitive abilities influence information system usability.
Abstract: Although the cognitive ability “perceptual speed” is known to influence search performance by end-users, previous research has not established the mechanism by which this influence occurred. Results from educational psychology suggest that learning that occurs during searching is likely to be influenced by perceptual speed. An experiment was designed to test how this cognitive ability would interact with a system feature designed to enhance learning of search vocabulary, specifically, presenting subject descriptors as the first element in the display of a reference. Results showed significant interactions between perceptual speed and the order of presentation of data elements in predicting both vocabulary learning and search performance. These results indicate that searchers with higher levels of perceptual speed will learn additional search vocabulary, and use that vocabulary to complete higher quality searches, when they use a system designed to optimize scanning of subject descriptors. This outcome supports the idea that cognitive abilities influence information system usability, and that usability is determined by interactions between characteristics of users and system features. The findings also suggest that system features that enhance the learning of search vocabulary, such as query expansion mechanisms, can have a significant positive effect on the quality of end-user searching.