TL;DR: In this paper, the authors demonstrate that certain classical problems associated with the notion of the teacher in supervised learning can be solved by judicious use of learned internal models as components of the adaptive system.
TL;DR: This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot-learning problems.
Abstract: Reinforcement learning agents are adaptive, reactive, and self-supervised. The aim of this dissertation is to extend the state of the art of reinforcement learning and enable its applications to complex robot-learning problems. In particular, it focuses on two issues. First, learning from sparse and delayed reinforcement signals is hard and in general a slow process. Techniques for reducing learning time must be devised. Second, most existing reinforcement learning methods assume that the world is a Markov decision process. This assumption is too strong for many robot tasks of interest.
This dissertation demonstrates how we can possibly overcome the slow learning problem and tackle non-Markovian environments, making reinforcement learning more practical for realistic robot tasks: (1) Reinforcement learning can be naturally integrated with artificial neural networks to obtain high-quality generalization, resulting in a significant learning speedup. Neural networks are used in this dissertation, and they generalize effectively even in the presence of noise and a large of binary and real-valued inputs. (2) Reinforcement learning agents can save many learning trials by using an action model, which can be learned on-line. With a model, an agent can mentally experience the effects of its actions without actually executing them. Experience replay is a simple technique that implements this idea, and is shown to be effective in reducing the number of action executions required. (3) Reinforcement learning agents can take advantage of instructive training instances provided by human teachers, resulting in a significant learning speedup. Teaching can also help learning agents avoid local optima during the search for optimal control. Simulation experiments indicate that even a small amount of teaching can save agents many learning trials. (4) Reinforcement learning agents can significantly reduce learning time by hierarchical learning--they first solve elementary learning problems and then combine solutions to the elementary problems to solve a complex problem. Simulation experiments indicate that a robot with hierarchical learning can solve a complex problem, which otherwise is hardly solvable within a reasonable time. (5) Reinforcement learning agents can deal with a wide range of non-Markovian environments by having a memory of their past. Three memory architectures are discussed. They work reasonably well for a variety of simple problems. One of them is also successfully applied to a nontrivial non-Markovian robot task.
The results of this dissertation rely on computer simulation, including (1) an agent operating in a dynamic and hostile environment and (2) a mobile robot operating in a noisy and non-Markovian environment. The robot simulator is physically realistic. This dissertation concludes that it is possible to build artificial agents than can acquire complex control policies effectively by reinforcement learning.
TL;DR: The material presented in this book addresses the analysis and design of learning control systems using a system-theoretic approach, and the application of artificial neural networks to the learning control problem.
Abstract: The material presented in this book addresses the analysis and design of learning control systems. It begins with an introduction to the concept of learning control, including a comprehensive literature review. The text follows with a complete and unifying analysis of the learning control problem for linear LTI systems using a system-theoretic approach which offers insight into the nature of the solution of the learning control problem. Additionally, several design methods are given for LTI learning control, incorporating a technique based on parameter estimation and a one-step learning control algorithm for finite-horizon problems. Further chapters focus unpon learning control for deterministic nonlinear systems, and a time-varying learning controller is presented which can be applied to a class of nonlinear systems, including the models of typical robotic manipulators. The book concludes with the application of artificial neural networks to the learning control problem. Three specific ways to use neural nets for this purpose are discussed, including two methods which use backpropagation training and reinforcement learning.
TL;DR: A system architecture and a network computational approach compatible with the goal of devising a general-purpose artificial neural network computer are described and the functionalities of supervised learning and optimization are illustrated.
Abstract: A system architecture and a network computational approach compatible with the goal of devising a general-purpose artificial neural network computer are described. The functionalities of supervised learning and optimization are illustrated, and cluster analysis and associative recall are briefly mentioned. >
TL;DR: The “Gibbs sampling” simulation procedure for “sigmoid” and “noisy-OR” varieties of probabilistic belief networks can support maximum-likelihood learning from empirical data through local gradient ascent.
TL;DR: For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, inconsistency in rating among experts was observed, with fuzzy c-means approaches being slightly preferred over feedforward cascade correlation results.
Abstract: Magnetic resonance (MR) brain section images are segmented and then synthetically colored to give visual representations of the original data with three approaches: the literal and approximate fuzzy c-means unsupervised clustering algorithms, and a supervised computational neural network. Initial clinical results are presented on normal volunteers and selected patients with brain tumors surrounded by edema. Supervised and unsupervised segmentation techniques provide broadly similar results. Unsupervised fuzzy algorithms were visually observed to show better segmentation when compared with raw image data for volunteer studies. For a more complex segmentation problem with tumor/edema or cerebrospinal fluid boundary, where the tissues have similar MR relaxation behavior, inconsistency in rating among experts was observed, with fuzz-c-means approaches being slightly preferred over feedforward cascade correlation results. Various facets of both approaches, such as supervised versus unsupervised learning, time complexity, and utility for the diagnostic process, are compared. >
TL;DR: Data preprocessing for pictorial pattern recognition: preprocessing in the spatial domain pictorial data preposessing and shape analysis transforms and image processing in the transform doamin wavelets and wavelet transforms.
Abstract: Pattern recognition: supervised and unsupervised learning in pattern recognition nonparametric decision theoretic classification nonparametric (distribution-free) training of discriminant functions statistical discriminant functions clusteringanalysis and unsupervised learning dimensionality reduction and feature selection. Neural networks for pattern recognition: multilayer perception radial basis function networks hamming net and Kohonen self-organizing feature map the Hopfield model.Data preprocessing for pictorial pattern recognition: preprocessing in the spatial domain pictorial data preposessing and shape analysis transforms and image processing in the transform doamin wavelets and wavelet transforms. Applications: exemplaryapplications. Practical concerns of image processing and pattern recognition: computer system architectures for image processing and pattern recognition. Appendices: digital images image model and discrete mathematics digital image fundamentals matrixmanipulation Eigenvectors and Eigenvalves of an operator notation.
TL;DR: It is proved that for all finite deterministic domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of exploration in reinforcement learning.
Abstract: Exploration plays a fundamental role in any active learning system. This study evaluates the role of exploration in active learning and describes several local techniques for exploration in finite, discrete domains, embedded in a reinforcement learning framework (delayed reinforcement). This paper distinguishes between two families of exploration schemes: undirected and directed exploration. While the former family is closely related to random walk exploration, directed exploration techniques memorize exploration-specific knowledge which is used for guiding the exploration search. In many finite deterministic domains, any learning technique based on undirected exploration is inefficient in terms of learning time, i.e., learning time is expected to scale exponentially with the size of the state space. We prove that for all these domains, reinforcement learning using a directed technique can always be performed in polynomial time, demonstrating the important role of exploration in reinforcement learning. (The proof is given for one specific directed exploration technique named counter-based exploration.) Subsequently, several exploration techniques found in recent reinforcement learning and connectionist adaptive control literature are described. In order to trade off efficiently between exploration and exploitation --- a trade-off which characterizes many real-world active learning tasks --- combination methods are described which explore and avoid costs simultaneously. This includes a selective attention mechanism, which allows smooth switching between exploration and exploitation. All techniques are evaluated and compared on a discrete reinforcement learning task (robot navigation). The empirical evaluation is followed by an extensive discussion of benefits and limitations of this work.
TL;DR: A novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns based on two opposing forces that has a potential for removing not only linear but also nonlinear output redundancy.
Abstract: I propose a novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor, which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to filter "abstract concepts" out of the environmental input such that these concepts are statistically independent of those on which the other units focus. I discuss various simple yet potentially powerful implementations of the principle that aim at finding binary factorial codes (Barlow et al. 1989), i.e., codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, and (3) novelty detection. Methods for...
TL;DR: An objective function formulation of the Bienenstock, Cooper, and Munro (BCM) theory of visual cortical plasticity is presented that permits the connection between the unsupervised BCM learning procedure and various statistical methods, in particular, that of Projection Pursuit.
TL;DR: A fuzzy Kohonen clustering network which integrates the fuzzy c-means (FCM) model into the learning rate and updating strategies of the Kohonen network is proposed, and it is proved that the proposed scheme is equivalent to the c-Means algorithms.
Abstract: The authors propose a fuzzy Kohonen clustering network which integrates the fuzzy c-means (FCM) model into the learning rate and updating strategies of the Kohonen network. This yields an optimization problem related to FCM, and the numerical results show improved convergence as well as reduced labeling errors. It is proved that the proposed scheme is equivalent to the c-means algorithms. The new method can be viewed as a Kohonen type of FCM, but it is self-organizing, since the size of the update neighborhood and the learning rate in the competitive layer are automatically adjusted during learning. Anderson's IRIS data were used to illustrate this method. The results are compared with the standard Kohonen approach. >
TL;DR: It is shown that the feedforward network (FFN) pattern learning rule is a first-order approximation of the FFN-batch learning rule, and is valid for nonlinear activation networks provided the learning rate is small.
Abstract: Four types of neural net learning rules are discussed for dynamic system identification. It is shown that the feedforward network (FFN) pattern learning rule is a first-order approximation of the FFN-batch learning rule. As a result, pattern learning is valid for nonlinear activation networks provided the learning rate is small. For recurrent types of networks (RecNs), RecN-pattern learning is different from RecN-batch learning. However, the difference can be controlled by using small learning rates. While RecN-batch learning is strict in a mathematical sense, RecN-pattern learning is simple to implement and can be implemented in a real-time manner. Simulation results agree very well with the theorems derived. It is shown by simulation that for system identification problems, recurrent networks are less sensitive to noise. >
TL;DR: A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology based on the model-free distributed learning mechanism of Dembo and Kailath and supported by a modified parameter update rule.
Abstract: A parallel stochastic algorithm is investigated for error-descent learning and optimization in deterministic networks of arbitrary topology. No explicit information about internal network structure is needed. The method is based on the model-free distributed learning mechanism of Dembo and Kailath. A modified parameter update rule is proposed by which each individual parameter vector perturbation contributes a decrease in error. A substantially faster learning speed is hence allowed. Furthermore, the modified algorithm supports learning time-varying features in dynamical networks. We analyze the convergence and scaling properties of the algorithm, and present simulation results for dynamic trajectory learning in recurrent networks.
TL;DR: An accelerated stochastic approximation algorithm is developed for the identification of weighting function associated with boundary control in one-dimensional linear distributed-parameter systems.
Abstract: An accelerated stochastic approximation algorithm is developed for the identification of weighting function associated with boundary control in one-dimensional linear distributed-parameter systems. The weighting function is assumed to be variable separable, and each variable is approximated by a finite number of orthonormal polynomials. In the absence of noise, this algorithm will converge in a finite number of steps. For adaptive control, on-line weighting function estimators are developed which use the optimal control function as input. These estimators are functional gradient algorithms based on least square approach. They can be used for estimating weighting function associated with either boundary or distributed control.
TL;DR: This paper studies three connectionist approaches which learn to use history to handle perceptual aliasing: the window-Q, recurrent- Q, and recurrent-model architectures.
Abstract: Reinforcement learning is a type of unsupervised learning for sequential decision making. Q-learning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a mapping from states and actions to their utilities. An important assumption of Q-learning is the Markovian environment assumption, meaning that any information needed to determine the optimal actions is reflected in the agent''s state representation. Consider an agent whose state representation is based solely on its immediate perceptual sensations. When its sensors are not able to make essential distinctions among world states, the Markov assumption is violated, causing a problem called perceptual aliasing. For example, when facing a closed box, an agent based on its current visual sensation cannot act optimally if the optimal action depends on the contents of the box. There are two basic approaches to addressing this problem -- using more sensors or using history to figure out the current world state. This paper studies three connectionist approaches which learn to use history to handle perceptual aliasing: the window-Q, recurrent-Q, and recurrent-model architectures. Empirical study of these architectures is presented. Their relative strengths and weaknesses are also discussed.
TL;DR: A hybrid system called K scBANN (Knowledge-Based Artificial Neural Networks) is a three-part hybrid learning system built on top of "neural" learning techniques which is shown to be an effective combination of these two learning methods.
Abstract: Explanation-based and empirical learning are two largely complementary methods of machine learning. These approaches to machine learning both have serious problems which preclude their being a general purpose learning method. However, a "hybrid" learning method that combines explanation-based with empirical learning may be able to use the strengths of one learning method to address the weaknesses of the other method. Hence, a system that effectively combines the two approaches to learning can be expected to be superior to either approach in isolation. This thesis describes a hybrid system called K scBANN which is shown to be an effective combination of these two learning methods.
K scBANN (Knowledge-Based Artificial Neural Networks) is a three-part hybrid learning system built on top of "neural" learning techniques. The first part uses a set of approximately-correct rules to determine the structure and initial link weights of an artificial neural network, thereby making the rules accessible for modification by neural learning. The second part of K scBANN modifies the resulting network using essentially standard neural learning techniques. The third part of K scBANN extracts refined rules from trained networks.
K scBANN is evaluated by empirical tests in the domain of molecular biology. Networks created by K scBANN are shown to be superior, in terms of their ability to correctly classify unseen examples, to a wide variety of learning systems as well as techniques proposed by experts in the problems investigated. In addition, empirical tests show that K scBANN is robust to errors in the initial rules and insensitive to problems resulting from the presence of extraneous input features.
The third part of K scBANN, which extracts rules from trained networks, addresses a significant problem in the use of neural networks--understanding what a neural network learns. Empirical tests of the proposed rule-extraction method show that it simplifies understanding of trained networks by reducing the number of: consequents (hidden units), antecedents (weighted links), and possible antecedent weights. Surprisingly, the extracted rules are often more accurate at classifying examples not seen during training than the trained network from which they came.
TL;DR: An understanding of the sample complexity of learning in several existing models is provided and a systematic investigation and comparison of two fundamental quantities in learning and information theory is undertaken.
Abstract: Summary form only given. A Bayesian or average-case model of concept learning is given. The model provides more precise characterizations of the learning curve (sample complexity) behaviour that depends on properties of both the prior distribution over concepts and the sequence of instances seen by the learner. It unites in a common framework statistical physics and VC dimension theories of learning curves. A systematic investigation and comparison of two fundamental quantities in learning and information theory is undertaken. These are the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This paper provides an understanding of the sample complexity of learning in several existing models. >
TL;DR: A real-time supervised structure and parameter learning algorithm for constructing fuzzy neural networks (FNNs) automatically and dynamically is proposed which combines the backpropagation learning scheme for the parameter learning and a novel fuzzy similarity measure for the structure learning.
Abstract: The authors propose a real-time supervised structure and parameter learning algorithm for constructing fuzzy neural networks (FNNs) automatically and dynamically. This algorithm combines the backpropagation learning scheme for the parameter learning and a novel fuzzy similarity measure for the structure learning. The fuzzy similarity measure is a new tool to determine the degree to which two fuzzy sets are equal. The FNN is a feedforward multilayered network which integrates the basic elements and functions of a traditional fuzzy logic controller into a connectionist structure which has distributed learning abilities. The structure learning decides the proper connection types and the number of hidden units which represent fuzzy logic rules and the number of fuzzy partitions. The parameter learning adjusts the node and link parameters which represent the membership functions. The proposed supervised learning algorithm provides an efficient way of constructing a FNN in real time. Simulation results are presented to illustrate the performance and applicability of the proposed learning algorithm. >
TL;DR: It is argued that for certain types of problems the latter approach, of which reinforcement learning is an example, can yield faster, more reliable learning, while the former approach is relatively inefficient.
Abstract: Learning control involves modifying a controller's behavior to improve its performance as measured by some predefined index of performance (IP). If control actions that improve performance as measured by the IP are known, supervised learning methods, or methods for learning from examples, can be used to train the controller. But when such control actions are not known a priori, appropriate control behavior has to be inferred from observations of the IP. One can distinguish between two classes of methods for training controllers under such circumstances. Indirect methods involve constructing a model of the problem's IP and using the model to obtain training information for the controller. On the other hand, direct, or model-free, methods obtain the requisite training information by observing the effects of perturbing the controlled process on the IP. Despite its reputation for inefficiency, we argue that for certain types of problems the latter approach, of which reinforcement learning is an example, can yield faster, more reliable learning. Using several control problems as examples, we illustrate how the complexity of model construction can often exceed that of solving the original control problem using direct reinforcement learning methods, making indirect methods relatively inefficient. These results indicate the importance of considering direct reinforcement learning methods as tools for learning to solve control problems. We also present several techniques for augmenting the power of reinforcement learning methods. These include (1) the use of local models to guide assigning credit to the components of a reinforcement learning system, (2) implementing a procedure from experimental psychology called "shaping" to improve the efficiency of learning, thereby making more complex problems amenable to solution, and (3) implementing a multi-level learning architecture designed for exploiting task decomposability by using previously-learned behaviors as primitives for learning more complex tasks.
TL;DR: It is concluded that the incorporation of psychologically and biologically plausible structural and functional characteristics, like modularity, unsupervised (competitive) learning, and a novelty dependent learning rate, may contribute to solving some of the problems often encountered in connectionist modeling.
TL;DR: This important new work recognizes the advanced nature of today's artificial neural networks, uniquely emphasizing a modular approach to neural network learning, and covers the full range of conceivable approaches to the modularization of learning.
Abstract: From the Publisher:
This important new work recognizes the advanced nature of today's artificial neural networks, uniquely emphasizing a modular approach to neural network learning. By breaking down the learning task into relatively independent parts of lower complexity, Modular Learning in Neural Networks demonstrates how neural network learning can be made more powerful and efficient. The book's modular approach, unlike the monolithic viewpoint, admits intermediary solution stages whose success can be independently verified, as in other engineering fields. Each stage can be evaluated before moving on to the subsequent one, and the reason for possible failures can be analyzed, ultimately leading to the improved development and engineering of applications. The modular approach also takes into account growing network complexity, reducing the difficulty of such inevitable problems as scaling and convergence. Modular Learning in Neural Networks' modular approach is also fully in step with important psychological and neurobiological research. Studies in developmental psychology demonstrate the incremental nature of human learning, in which the success of each stage is conditioned by the successful accomplishment of the previous stage, while neurobiology has depicted the human brain as a complex structure of cooperating modules. Modular Learning in Neural Networks covers the full range of conceivable approaches to the modularization of learning, including decomposition of learning into modules using supervised and unsupervised learning types; decomposition of the function to be mapped into linear and nonlinear parts; decomposition of the neural network to minimize harmful interferences between a large number of network parameters during learning; decomposition of the application task into subtasks that are learned separately; decomposition into a knowledge-based part and a learning part. The book attempts to show that modular learning based on these approaches is helpful in improving t
TL;DR: The current paper considers how to specify targets by sets of constraints, rather than as particular vectors, to allow supervised learning algorithms to make use of flexibility in training.
TL;DR: An approach to combining symbolic and connectionist approaches to machine learning is described, with a three-stage framework and the research of several groups is reviewed with respect to this framework.
Abstract: This article describes an approach to combining symbolic and connectionist approaches to machine learning A three-stage framework is presented and the research of several groups is reviewed with respect to this framework The first stage involves the insertion of symbolic knowledge into neural networks, the second addresses the refinement of this prior knowledge in its neural representation, while the third concerns the extraction of the refined symbolic knowledge Experimental results and open research issues are discussed
TL;DR: It is demonstrated that the network can learn, entirely unsupervised, to classify an ensemble of several patterns by observing pattern trajectories, even though there are abrupt transitions from one object to another between trajectories.
Abstract: The invariance of an objects' identity as it transformed over time provides a powerful cue for perceptual learning. We present an unsupervised learning procedure which maximizes the mutual information between the representations adopted by a feed-forward network at consecutive time steps. We demonstrate that the network can learn, entirely unsupervised, to classify an ensemble of several patterns by observing pattern trajectories, even though there are abrupt transitions from one object to another between trajectories. The same learning procedure should be widely applicable to a variety of perceptual learning tasks.
TL;DR: The demonstrations show that Competitive Hebbian Learning is effective in finding structure in the correlations of input vector components, in separating differing, but nonorthogonal input vectors, in finding useful single-layer functions which could be applied to the solution of Boolean-algebra problems, and in finding solutions to an approximate image-compression task.
TL;DR: An application of an artificial neural network model, the Adaptive Resonance Theory (ART), to Chinese character classification is described, and experimental results indicate that the classifier is able to achieve a high classification rate.
TL;DR: In this article, a plant controller using reinforcement learning for controlling a plant includes action and critic networks with enhanced learning for generating a plant control signal, which is enhanced within the critic network by using a distance parameter which represents the difference between the actual and desired states of the quantitative performance, or output, of the plant when generating the reinforcement signal for the action network.
Abstract: A plant controller using reinforcement learning for controlling a plant includes action and critic networks with enhanced learning for generating a plant control signal. Learning is enhanced within the action network by using a neural network configured to operate according to unsupervised learning techniques based upon a Kohonen Feature Map. Learning is enhanced within the critic network by using a distance parameter which represents the difference between the actual and desired states of the quantitative performance, or output, of the plant when generating the reinforcement signal for the action network.
TL;DR: FLORA2 is a program for supervised learning of concepts that are subject to concept drift that keeps in memory not only valid descriptions of the concepts as they are derived from the objects currently present in the window, but alsòcandidate descriptions' that may turn into valid descriptions in the future.
Abstract: FLORA2 is a program for supervised learning of concepts that are subject to concept drift. The learning process is incremental in that the examples are processed one by one. A special feature of our program consists in keeping in memory a subset of examples { a window. In time, new examples are being added to the window while other ones are considered outdated and are forgotten. In order to track the concept drift, the system keeps in memory not only valid descriptions of the concepts as they are derived from the objects currently present in the window, but alsòcandidate descriptions' that may turn into valid descriptions in the future.
TL;DR: A neural network mechanism is proposed to modify the rhythmic motion of a two-legged robot when walking on sloping surfaces using a sensory input using reciprocally inhibited and excited neurons.
Abstract: A neural network mechanism is proposed to modify the rhythmic motion (gait) of a two-legged robot when walking on sloping surfaces using a sensory input. The robot starts walking on a terrain with no previous knowledge, but accumulates walking experience during walking, thus constantly improving its walking gait. The proposed network consists of 20 reciprocally inhibited and excited neurons. An unsupervised learning rule was implemented using reinforcement signals. Two learning algorithms are introduced. The primary concern in the first algorithm was the speed of gait modification, whereas the second algorithm provided a solution with minimum energy consumption. A static learning approach where learning takes place only at prespecified moments is proposed. >