Top 259 papers published in the topic of Unsupervised learning in 1996

Showing papers on "Unsupervised learning published in 1996"

Emergence of simple-cell receptive field properties by learning a sparse code for natural images

[...]

Bruno A. Olshausen¹, Bruno A. Olshausen², David J. Field¹•Institutions (2)

Cornell University¹, University of California, Davis²

13 Jun 1996-Nature

TL;DR: It is shown that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex.

...read moreread less

Abstract: The receptive fields of simple cells in mammalian primary visual cortex can be characterized as being spatially localized, oriented and bandpass (selective to structure at different spatial scales), comparable to the basis functions of wavelet transforms. One approach to understanding such response properties of visual neurons has been to consider their relationship to the statistical structure of natural images in terms of efficient coding. Along these lines, a number of studies have attempted to train unsupervised learning algorithms on natural images in the hope of developing receptive fields with similar properties, but none has succeeded in producing a full set that spans the image space and contains all three of the above properties. Here we investigate the proposal that a coding strategy that maximizes sparseness is sufficient to account for these properties. We show that a learning algorithm that attempts to find sparse linear codes for natural scenes will develop a complete family of localized, oriented, bandpass receptive fields, similar to those found in the primary visual cortex. The resulting sparse image code provides a more efficient representation for later stages of processing because it possesses a higher degree of statistical independence among its outputs.

...read moreread less

6,586 citations

Book•

Elements of artificial neural networks

[...]

Kishan G. Mehrotra¹, Chilukuri K. Mohan¹, Sanjay Ranka²•Institutions (2)

Syracuse University¹, University of Florida²

11 Oct 1996

TL;DR: This paper presents a meta-modelling framework that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of supervised learning of neural networks.

...read moreread less

Abstract: History of neural networks supervised learning - single-layer networks supervised learning - multilayer networks I supervised learning - multilayer networks II unsupervised learning associative models optimization methods a little math data.

...read moreread less

1,095 citations

Journal Article•10.1109/5.537105•

Engineering applications of the self-organizing map

[...]

Teuvo Kohonen¹, Erkki Oja¹, Olli Simula¹, Ari Visa¹, Jari Kangas¹ - Show less +1 more•Institutions (1)

Helsinki University of Technology¹

1 Oct 1996

TL;DR: The self-organizing map method, which converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display, can be utilized for many tasks: reduction of the amount of training data, speeding up learning nonlinear interpolation and extrapolation, generalization, and effective compression of information for its transmission.

...read moreread less

Abstract: The self-organizing map (SOM) method is a new, powerful software tool for the visualization of high-dimensional data. It converts complex, nonlinear statistical relationships between high-dimensional data into simple geometric relationships on a low-dimensional display. As it thereby compresses information while preserving the most important topological and metric relationships of the primary data elements on the display, it may also be thought to produce some kind of abstractions. The term self-organizing map signifies a class of mappings defined by error-theoretic considerations. In practice they result in certain unsupervised, competitive learning processes, computed by simple-looking SOM algorithms. Many industries have found the SOM-based software tools useful. The most important property of the SOM, orderliness of the input-output mapping, can be utilized for many tasks: reduction of the amount of training data, speeding up learning nonlinear interpolation and extrapolation, generalization, and effective compression of information for its transmission.

...read moreread less

921 citations

Journal Article•10.1109/91.493905•

Validity-guided (re)clustering with applications to image segmentation

[...]

A. Bensaid¹, Lawrence O. Hall², James C. Bezdek³, Laurence P. Clarke², Martin L. Silbiger², John A. Arrington², Reed Murtagh² - Show less +3 more•Institutions (3)

Al Akhawayn University¹, University of South Florida², University of West Florida³

01 May 1996-IEEE Transactions on Fuzzy Systems

TL;DR: The validity-guided VGC algorithm uses cluster-validity information to guide a fuzzy (re)clustering process toward better solutions, and VGC's performance approaches that of the (supervised) k-nearest-neighbors algorithm.

...read moreread less

Abstract: When clustering algorithms are applied to image segmentation, the goal is to solve a classification problem. However, these algorithms do not directly optimize classification duality. As a result, they are susceptible to two problems: 1) the criterion they optimize may not be a good estimator of "true" classification quality, and 2) they often admit many (suboptimal) solutions. This paper introduces an algorithm that uses cluster validity to mitigate problems 1 and 2. The validity-guided (re)clustering (VGC) algorithm uses cluster-validity information to guide a fuzzy (re)clustering process toward better solutions. It starts with a partition generated by a soft or fuzzy clustering algorithm. Then it iteratively alters the partition by applying (novel) split-and-merge operations to the clusters. Partition modifications that result in improved partition validity are retained. VGC is tested on both synthetic and real-world data. For magnetic resonance image (MRI) segmentation, evaluations by radiologists show that VGC outperforms the (unsupervised) fuzzy c-means algorithm, and VGC's performance approaches that of the (supervised) k-nearest-neighbors algorithm.

...read moreread less

470 citations

Journal Article•10.1007/BF00114731•

Incremental multi-step Q-learning

[...]

Jing Peng¹, Ronald J. Williams²•Institutions (2)

University of California, Riverside¹, Northeastern University²

01 Jan 1996-Machine Learning

TL;DR: A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.

...read moreread less

Abstract: This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

...read moreread less

387 citations

Journal Article•10.1016/0893-6080(96)83696-3•

Structural learning with forgetting

[...]

Masumi Ishikawa¹•Institutions (1)

Kyushu Institute of Technology¹

01 Apr 1996-Neural Networks

TL;DR: Results demonstrate the effectiveness of structural learning with forgetting, applied to various examples: the discovery of Boolean functions, classification of irises, discovery of recurrent networks, prediction of time series and rule extraction from mushroom data.

...read moreread less

334 citations

Book•10.1007/978-1-4612-2404-4•

Learning from Data

[...]

Doug Fisher, Hans-J. Lenz

1 Jan 1996

TL;DR: Machine Learning involves methods for computers to use data to “learn” how to perform complex tasks that are applied to a wide range of tasks: recommender systems, medical diagnosis, stock market prediction, game playing agents, robot locomotion, engineering design, speech recognition, spam detection, etc.

...read moreread less

Abstract: Machine Learning (ML) is one of most exciting areas in computer science today. ML involves methods for computers to use data to “learn” how to perform complex tasks. ML techniques are highly interdisciplinary, and are applied to a wide range of tasks: recommender systems, medical diagnosis, stock market prediction, game playing agents (AIs), robot locomotion, engineering design, speech recognition, spam detection, etc.

...read moreread less

325 citations

Journal Article•10.1109/81.542280•

Robust neural networks with on-line learning for blind identification and blind separation of sources

[...]

Andrzej Cichocki, Rolf Unbehauen¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Nov 1996-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: Two unsupervised, self-normalizing, adaptive learning algorithms are developed for robust blind identification and/or blind separation of independent source signals from a linear mixture of them and are suitable for real-time implementations.

...read moreread less

Abstract: Two unsupervised, self-normalizing, adaptive learning algorithms are developed for robust blind identification and/or blind separation of independent source signals from a linear mixture of them. One of these algorithms is developed for on-line learning of a single-layer feed-forward neural network model and a second one for a feedback (fully recurrent) neural network model. The proposed algorithms are robust, efficient, fast and suitable for real-time implementations. Moreover, they ensure the separation of extremely weak or badly scaled stationary signals, as well as a successful separation even if the mixture matrix is very ill-conditioned (near singular). The performance of the proposed algorithms is illustrated by computer simulation experiments.

...read moreread less

280 citations

Proceedings Article•

Discovering Structure in Multiple Learning Tasks: The TC Algorithm.

[...]

Sebastian Thrun, Joseph O'Sullivan

1 Jan 1996

TL;DR: The task-clustering algorithm TC clusters learning tasks into classes of mutually related tasks, and outperforms its non-selective counterpart in situations where only a small number of tasks is relevant.

...read moreread less

Abstract: Recently, there has been an increased interest in “lifelong” machine learning methods, that transfer knowledge across multiple learning tasks. Such methods have repeatedly been found to outperform conventional, single-task learning algorithms when the learning tasks are appropriately related. To increase robustness of such approaches, methods are desirable that can reason about the relatedness of individual learning tasks, in order to avoid the danger arising from tasks that are unrelated and thus potentially misleading. This paper describes the task-clustering (TC) algorithm. TC clusters learning tasks into classes of mutually related tasks. When facing a new learning task, TC first determines the most related task cluster, then exploits information selectively from this task cluster only. An empirical study carried out in a mobile robot domain shows that TC outperforms its non-selective counterpart in situations where only a small number of tasks is relevant.

...read moreread less

268 citations

Dissertation•

Unsupervised language acquisition

[...]

Carl G. De Marcken, Robert C. Berwick

1 Jan 1996

TL;DR: In this article, a computational theory of unsupervised language acquisition is presented, which is based heavily on concepts borrowed from machine learning and statistical estimation, and can be used for data compression, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language.

...read moreread less

Abstract: Children are exposed to speech and other environmental evidence, from which they learn language. How do they do this? More specifically, how do children map from complex, physical signals to grammars that enable them to generate and interpret new utterances from their language? This thesis presents a computational theory of unsupervised language acquisition. By computational we mean that the theory precisely defines procedures for learning language, procedures that have been implemented and tested in the form of computer programs. By unsupervised we mean that the theory explains how language learning can take place with no explicit help from a teacher, but only exposure to ordinary spoken or written utterances. The theory requires very little of the learning environment. For example, it predicts that much knowledge of language can be acquired even in situations where the learner has no access to the meaning of utterances. In this way the theory is extremely conservative, making few or no assumptions that are not obviously true of the situation children learn in. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence. Thus, the goal of the learner is to acquire a grammar under which the evidence is "typical", in a statistical sense. Much of the thesis is devoted to explaining conditions that must hold for this learning strategy to arrive at the desired form of grammar. The thesis introduces a variety of technical innovations, among them a common representation for evidence and grammars that has many linguistically and statistically desirable properties. In this representation, both utterances and parameters in the grammar are represented by composing parameters. A second contribution is a learning strategy that separates the "content" of linguistic parameters from their representation. Algorithms based on it suffer from few of the search problems that have plagued other computational approaches to language acquisition. The theory has been tested on problems of learning lexicons (vocabularies) from text and speech signals. It performs extremely well on various objective criteria, acquiring knowledge that causes it to assign almost exactly the same linguistic structure to utterances as humans do. The theory has application to data compression, speech recognition, machine translation, information retrieval, and other tasks that rely on either structural or stochastic descriptions of language. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

...read moreread less

202 citations

Journal Article•10.1109/59.544638•

A neural network based technique for short-term forecasting of anomalous load periods

[...]

Regina Lamedica, Alberto Prudenzi, M. Sforna, Maurizio Caciotta, V.O. Cencellli - Show less +1 more

01 Nov 1996-IEEE Transactions on Power Systems

TL;DR: The paper illustrates a part of the research activity conducted by the authors in the field of electric short term load forecasting (STLF) based on artificial neural network (ANN) architectures and the unconventional use of information deriving from the classification stage permits the proposed procedure to obtain a relevant enhancement of the forecast accuracy for anomalous load situations.

...read moreread less

Abstract: The paper illustrates a part of the research activity conducted by the authors in the field of electric short term load forecasting (STLF) based on artificial neural network (ANN) architectures. Previous experiences with basic ANN architectures have shown that, even though these architectures provide results comparable with those obtained by human operators for most normal days, they evidence some accuracy deficiencies when applied to "anomalous" load conditions occurring during holidays and long weekends. For these periods a specific procedure based upon a combined (unsupervised/supervised) approach has been proposed. The unsupervised stage provides a preventive classification of the historical load data by means of a Kohonen's self-organizing map (SOM). The supervised stage, performing the proper forecasting activity, is obtained by using a multi-layer perceptron with a backpropagation learning algorithm similar to the ones mentioned above. The unconventional use of information deriving from the classification stage permits the proposed procedure to obtain a relevant enhancement of the forecast accuracy for anomalous load situations.

...read moreread less

Proceedings Article•

Imputation of missing data using machine learning techniques

[...]

Kamakshi Lakshminarayan¹, Steven A. Harp¹, Robert P. Goldman¹, Tariq Samad¹•Institutions (1)

Honeywell¹

2 Aug 1996

TL;DR: It is argued that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and potential applications for the procedures developed are discussed.

...read moreread less

Abstract: A serious problem in mining industrial data bases is that they are often incomplete, and a significant amount of data is missing, or erroneously entered This paper explores the use of machine-learning based alternatives to standard statistical data completion (data imputation) methods, for dealing with missing data We have approached the data completion problem using two well-known machine learning techniques The first is an unsupervised clustering strategy which uses a Bayesian approach to cluster the data into classes The classes so obtained are then used to predict multiple choices for the attribute of interest The second technique involves modeling missing variables by supervised induction of a decision tree-based classifier This predicts the most likely value for the attribute of interest Empirical tests using extracts from industrial databases maintained by Honeywell customers have been done in order to compare the two techniques These tests show both approaches are useful and have advantages and disadvantages We argue that the choice between unsupervised and supervised classification techniques should be influenced by the motivation for solving the missing data problem, and discuss potential applications for the procedures we are developing

...read moreread less

Book•

Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing

[...]

Stefan Wermter, Ellen Riloff, Gabriele Scheler

15 Mar 1996

TL;DR: Embedded machine learning systems for natural language processing: Acquiring and updating hierarchical knowledge for machine translation based on a clustering technique and applying an existing machine learning algorithm to text categorization.

...read moreread less

Abstract: Learning approaches for natural language processing- Separating learning and representation- Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods- Extracting rules for grammar recognition from Cascade-2 networks- Generating English plural determiners from semantic representations: A neural network learning approach- Knowledge acquisition in concept and document spaces by using self-organizing neural networks- Using hybrid connectionist learning for speech/language analysis- SKOPE: A connectionist/symbolic architecture of spoken Korean processing- Integrating different learning approaches into a multilingual spoken language translation system- Learning language using genetic algorithms- A statistical syntactic disambiguation program and what it learns- Training stochastic grammars on semantical categories- Learning restricted probabilistic link grammars- Learning PP attachment from corpus statistics- A minimum description length approach to grammar inference- Automatic classification of dialog acts with Semantic Classification Trees and Polygrams- Sample selection in natural language learning- Learning information extraction patterns from examples- Implications of an automatic lexical acquisition system- Using learned extraction patterns for text classification- Issues in inductive learning of domain-specific text extraction rules- Applying machine learning to anaphora resolution- Embedded machine learning systems for natural language processing: A general framework- Acquiring and updating hierarchical knowledge for machine translation based on a clustering technique- Applying an existing machine learning algorithm to text categorization- Comparative results on using inductive logic programming for corpus-based parser construction- Learning the past tense of English verbs using inductive logic programming- A dynamic approach to paradigm-driven analogy- Can punctuation help learning?- Using parsed corpora for circumventing parsing- A symbolic and surgical acquisition of terms through variation- A revision learner to acquire verb selection rules from human-made rules and examples- Learning from texts - A terminological metareasoning perspective

...read moreread less

Journal Article•10.1016/S0893-6080(96)00009-3•

Varieties of Helmholtz machine

[...]

Peter Dayan, Geoffrey E. Hinton

01 Nov 1996-Neural Networks

TL;DR: A number of different varieties of Helmholtz machines are suggested, each with its own strengths and weaknesses, and relates them to cortical information processing.

...read moreread less

Proceedings Article•10.1109/ROBOT.1996.506507•

Unsupervised learning of probabilistic models for robot navigation

[...]

Sven Koenig¹, Reid Simmons¹•Institutions (1)

Carnegie Mellon University¹

22 Apr 1996

TL;DR: An algorithm is described that adjusts the probabilities of the initial Markov model by passively observing the robot's interactions with its environment and learns good Markov models with a small amount of training data.

...read moreread less

Abstract: Navigation methods for office delivery robots need to take various sources of uncertainty into account in order to get robust performance. In previous work, we developed a reliable navigation technique that uses partially observable Markov models to represent metric, actuator and sensor uncertainties. This paper describes an algorithm that adjusts the probabilities of the initial Markov model by passively observing the robot's interactions with its environment. The learned probabilities more accurately reflect the actual uncertainties in the environment, which ultimately leads to improved navigation performance. The algorithm, an extension of the Baum-Welch algorithm, learns without a teacher and addresses the issues of limited memory and the cost of collecting training data. Empirical results show that the algorithm learns good Markov models with a small amount of training data.

...read moreread less

Nonlinear Blind Source Separation by Self-Organizing Maps

[...]

Petteri Pajunen, Aapo Hyvärinen, Juha Karhunen

1 Jan 1996

TL;DR: It is shown that a mapping to separate the sources by constructing mappings that make the components of the output vectors independent can be approximately realized using self-organizing maps with rectangular map topology.

...read moreread less

Journal Article•10.1613/JAIR.308•

Learning first-order definitions of functions

[...]

J. R. Quinlan¹•Institutions (1)

University of Sydney¹

01 Aug 1996-Journal of Artificial Intelligence Research

TL;DR: This paper shows how a particular first-order learning system is modified to customize it for finding definitions of functional relations, which leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy.

...read moreread less

Abstract: First-order learning involves finding a clause-form definition of a relation from examples of the relation and relevant background information. In this paper, a particular first-order learning system is modified to customize it for finding definitions of functional relations. This restriction leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy. Other first-order learning systems might benefit from similar specialization.

...read moreread less

Book Chapter•10.1007/3-540-61863-5_42•

Boosting First-Order Learning

[...]

J. Ross Quinlan¹•Institutions (1)

University of Sydney¹

23 Oct 1996

TL;DR: Early experimental results from applying boosting to ffoil, a first-order system that constructs definitions of functional relations, suggest that boosting will also prove beneficial for first- order induction.

...read moreread less

Abstract: Several empirical studies have confirmed that boosting classifier-learning systems can lead to substantial improvements in predictive accuracy. This paper reports early experimental results from applying boosting to ffoil, a first-order system that constructs definitions of functional relations. Although the evidence is less convincing than that for propositional-level learning systems, it suggests that boosting will also prove beneficial for first-order induction.

...read moreread less

Book•

Data Analysis for Chemists: Applications to QSAR and Chemical Product Design

[...]

David J. Livingstone

18 Jan 1996

TL;DR: 1. Chemical properties and chemical structure 2. Experimental design - compound and parameter selection 3. data pre-treatment 4. data display 5. Unsupervised learning 6. Regression analysis 7. Supervised learning 8. Treatment of multiple dependent variables 9. Artificial intelligence

...read moreread less

Abstract: 1. Chemical properties and chemical structure 2. Experimental design - compound and parameter selection 3. Data pre-treatment 4. Data display 5. Unsupervised learning 6. Regression analysis 7. Supervised learning 8. Treatment of multiple dependent variables 9. Artificial intelligence Appendix 1: Software Appendix 2: List of abbreviations

...read moreread less

Book Chapter•10.1007/978-3-662-03295-4_11•

Just-in-Time Learning and Estimation

[...]

George Cybenko

1 Jan 1996

10.7916/D8445ZNN•

An extensible meta-learning approach for scalable and accurate inductive learning

[...]

Philip K. Chan, Salvatore J. Stolfo

1 Jan 1996

TL;DR: A meta-learning approach to integrating the results of multiple learning processes that can obtain accurate classifiers from inaccurate classifiers trained from data subsets and utilizes machine learning to guide the integration.

...read moreread less

Abstract: Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a meta-learning approach to integrating the results of multiple learning processes. Our approach utilizes machine learning to guide the integration. We identified two main meta-learning strategies: combiner and arbiter. Both strategies are independent to the learning algorithms used in generating the classifiers. The combiner strategy attempts to reveal relationships among the learned classifiers' prediction patterns. The arbiter strategy tries to determine the correct prediction when the classifiers have different opinions. Various schemes under these two strategies have been developed. Empirical results show that our schemes can obtain accurate classifiers from inaccurate classifiers trained from data subsets. We also implemented and analyzed the schemes in a parallel and distributed environment to demonstrate their scalability.

...read moreread less

Proceedings Article•

Bayesian Unsupervised Learning of Higher Order Structure

[...]

Michael S. Lewicki¹, Terrence J. Sejnowski¹•Institutions (1)

Howard Hughes Medical Institute¹

3 Dec 1996

TL;DR: This work presents an algorithm that efficiently discovers higher order structure using EM and Gibbs sampling and can be interpreted as a stochastic recurrent network in which ambiguity in lower-level states is resolved through feedback from higher levels.

...read moreread less

Abstract: Multilayer architectures such as those used in Bayesian belief networks and Helmholtz machines provide a powerful framework for representing and learning higher order statistical relations among inputs. Because exact probability calculations with these models are often intractable, there is much interest in finding approximate algorithms. We present an algorithm that efficiently discovers higher order structure using EM and Gibbs sampling. The model can be interpreted as a stochastic recurrent network in which ambiguity in lower-level states is resolved through feedback from higher levels. We demonstrate the performance of the algorithm on benchmark problems.

...read moreread less

Journal Article•10.1162/NECO.1996.8.8.1677•

Hebbian learning of context in recurrent neural networks

[...]

Nicolas Brunel¹•Institutions (1)

Sapienza University of Rome¹

01 Nov 1996-Neural Computation

TL;DR: A simple unsupervised learning dynamics that produces a recurrent synaptic matrix that is shown to convert temporal correlations during training into spatial correlations between attractors, and calculates explicitly the probability distribution of synaptic efficacies as a function of training protocol, that is, the order in which stimuli are presented to the network.

...read moreread less

Abstract: Single electrode recordings in the inferotemporal cortex of monkeys during delayed visual memory tasks provide evidence for attractor dynamics in the observed region. The persistent elevated delay activities could be internal representations of features of the learned visual stimuli shown to the monkey during training. When uncorrelated stimuli are presented during training in a fixed sequence, these experiments display significant correlations between the internal representations. Recently a simple model of attractor neural network has reproduced quantitatively the measured correlations. An underlying assumption of the model is that the synaptic matrix formed during the training phase contains in its efficacies information about the contiguity of persistent stimuli in the training sequence. We present here a simple unsupervised learning dynamics that produces such a synaptic matrix if sequences of stimuli are repeatedly presented to the network at fixed order. The resulting matrix is then shown to convert temporal correlations during training into spatial correlations between attractors. The scenario is that, in the presence of selective delay activity, at the presentation of each stimulus, the activity distribution in the neural assembly contains information of both the current stimulus and the previous one (carried by the attractor). Thus the recurrent synaptic matrix can code not only for each of the stimuli presented to the network but also for their context. We combine the idea that for learning to be effective, synaptic modification should be stochastic, with the fact that attractors provide learnable information about two consecutive stimuli. We calculate explicitly the probability distribution of synaptic efficacies as a function of training protocol, that is, the order in which stimuli are presented to the network. We then solve for the dynamics of a network composed of integrate-and-fire excitatory and inhibitory neurons with a matrix of synaptic collaterals resulting from the learning dynamics. The network has a stable spontaneous activity, and stable delay activity develops after a critical learning stage. The availability of a learning dynamics makes possible a number of experimental predictions for the dependence of the delay activity distributions and the correlations between them, on the learning stage and the learning protocol. In particular it makes specific predictions for pair-associates delay experiments.

...read moreread less

Journal Article•10.1109/72.501737•

Local linear perceptrons for classification

[...]

Ethem Alpaydin¹, Michael I. Jordan•Institutions (1)

Boğaziçi University¹

01 May 1996-IEEE Transactions on Neural Networks

TL;DR: A structure composed of local linear perceptrons for approximating global class discriminants is investigated and it is concluded that even on such a high-dimensional problem, such local models are promising, much better than RBF's and use much less memory.

...read moreread less

Abstract: A structure composed of local linear perceptrons for approximating global class discriminants is investigated. Such local linear models may be combined in a cooperative or competitive way. In the cooperative model, a weighted sum of the outputs of the local perceptrons is computed where the weight is a function of the distance between the input and the position of the local perceptron. In the competitive model, the cost function dictates a mixture model where only one of the local perceptrons give output. Learning of the local models' positions and the linear mappings they implement are coupled and both supervised. We show that this is preferable to the uncoupled case where the positions are trained in an unsupervised manner before the separate, supervised training of mappings. We use goodness criteria based on the cross-entropy and give learning equations for both the cooperative and competitive cases. The coupled and uncoupled versions of cooperative and competitive approaches are compared among themselves and with multilayer perceptrons of sigmoidal hidden units and radial basis functions (RBFs) of Gaussian units on the application of recognition of handwritten digits. The criteria of comparison are the generalization accuracy, learning time, and the number of free parameters. We conclude that even on such a high-dimensional problem, such local models are promising. They generalize much better than RBF's and use much less memory. When compared with multilayer perceptrons, we note that local models learn much faster and generalize as well and sometimes better with comparable number of parameters.

...read moreread less

Proceedings Article•10.1109/IROS.1996.569014•

Reasonable performance in less learning time by real robot based on incremental state space segmentation

[...]

Yasutake Takahashi¹, Minoru Asada, Koh Hosoda•Institutions (1)

Osaka University¹

4 Nov 1996

TL;DR: A method by which a robot learns purposive behavior within less learning time by incrementally segmenting the sensor space based on the experiences of the robot.

...read moreread less

Abstract: Reinforcement learning has recently been receiving increased attention as a method for robot learning with little or no a priori knowledge and higher capability of reactive and adaptive behaviors. However, there are two major problems in applying it to real robot tasks: how to construct the state space, and how to reduce the learning time. This paper presents a method by which a robot learns purposive behavior within less learning time by incrementally segmenting the sensor space based on the experiences of the robot. The incremental segmentation is performed by constructing local models in the state space, which is based on the function approximation of the sensor outputs to reduce the learning time and on the reinforcement signal to emerge a purposive behavior. The method is applied to a soccer robot which tried to shoot a ball into a goal, The experiments with computer simulations and a real robot are shown. As a result, our real robot has learned a shooting behavior within less than one hour training by incrementally segmenting the state space.

...read moreread less

Journal Article•10.1162/NECO.1996.8.2.416•

A self-organizing neural network for the traveling salesman problem that is competitive with simulated annealing

[...]

Marco Budinich

01 Feb 1996-Neural Computation

TL;DR: This algorithm performs like the elastic net of Durbin and Willshaw (1987) and it improves when increasing the number of cities to get better than simulated annealing for problems with more than 500 cities.

...read moreread less

Abstract: Unsupervised learning applied to an unstructured neural network can give approximate solutions to the traveling salesman problem. For 50 cities in the plane this algorithm performs like the elastic net of Durbin and Willshaw (1987) and it improves when increasing the number of cities to get better than simulated annealing for problems with more than 500 cities. In all the tests this algorithm requires a fraction of the time taken by simulated annealing.

...read moreread less

Proceedings Article•

Adaptive On-line Learning in Changing Environments

[...]

Noboru Murata, Klaus-Robert Müller, Andreas Ziehe, Shun-ichi Amari

3 Dec 1996

TL;DR: An adaptive on-line algorithm extending the learning of learning idea can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available.

...read moreread less

Abstract: An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.

...read moreread less

Journal Article•10.1016/0378-4754(96)88223-1•

Developments and applications of the self-organizing map and related algorithms

[...]

Jari Kangas¹, Teuvo Kohonen¹•Institutions (1)

Helsinki University of Technology¹

01 Jun 1996-Mathematics and Computers in Simulation

TL;DR: The basic principles and developments of an unsupervised learning algorithms, the self-organizing map (SOM) and a supervised learning algorithm, the learning vector quantization (LVQ), and some practical applications of the algorithms are explained.

...read moreread less

Journal Article•

The identification of context-sensitive features: A formal definition of context for concept learning

[...]

Peter D. Turney¹•Institutions (1)

National Research Council¹

01 Jan 1996-arXiv: Learning

TL;DR: This paper formally distinguish three types of features: primary, contextual, and irrelevant features, and formally define what it means for one feature to be context-sensitive to another feature.

...read moreread less

Abstract: A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi- dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples. In this paper, we formally distinguish three types of features: primary, contextual, and irrelevant features. We also formally define what it means for one feature to be context-sensitive to another feature. Context-sensitive features complicate the task of the learner and potentially impair the learner's performance. Our formal definitions make it possible for a learner to automatically identify context-sensitive features. After context-sensitive features have been identified, there are several strategies that the learner can employ for managing the features; however, a discussion of these strategies is outside of the scope of this paper. The formal definitions presented here correct a flaw in previously proposed definitions. We discuss the relationship between our work and a formal definition of relevance.

...read moreread less

Journal Article•10.1109/72.536304•

Repairs to GLVQ: a new family of competitive learning schemes

[...]

Nicolaos B. Karayiannis¹, James C. Bezdek, Nikhil R. Pal, Richard J. Hathaway, Pin-I Pai - Show less +1 more•Institutions (1)

University of Houston¹

01 Sep 1996-IEEE Transactions on Neural Networks

TL;DR: This work identifies an algorithmic defect of the generalized learning vector quantization (GLVQ) scheme that causes it to behave erratically for a certain scaling of the input data and proposes a new family of models-the GLVQ-F family-that remedies the problem.

...read moreread less

Abstract: First, we identify an algorithmic defect of the generalized learning vector quantization (GLVQ) scheme that causes it to behave erratically for a certain scaling of the input data. We show that GLVQ can behave incorrectly because its learning rates are reciprocally dependent on the sum of squares of distances from an input vector to the node weight vectors. Finally, we propose a new family of models-the GLVQ-F family-that remedies the problem. We derive competitive learning algorithms for each member of the GLVQ-F model and prove that they are invariant to all scalings of the data. We show that GLVQ-F offers a wide range of learning models since it reduces to LVQ as its weighting exponent (a parameter of the algorithm) approaches one from above. As this parameter increases, GLVQ-F then transitions to a model in which either all nodes may be excited according to their (inverse) distances from an input or in which the winner is excited while losers are penalized. And as this parameter increases without limit, GLVQ-F updates all nodes equally. We illustrate the failure of GLVQ and success of GLVQ-F with the IRIS data.

...read moreread less

...

Expand