Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 1997
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 1997
Showing papers on "Unsupervised learning published in 1997"
Journal Article•10.1109/TNN.1997.641482•
The Nature Of Statistical Learning Theory

[...]

Vladimir Cherkassky1•
University of Minnesota1
01 Nov 1997-IEEE Transactions on Neural Networks
TL;DR: As one of the part of book categories, the nature of statistical learning theory always becomes the most wanted book.
Abstract: If you really want to be smarter, reading can be one of the lots ways to evoke and realize. Many people who like reading will have more knowledge and experiences. Reading can be a way to gain information from economics, politics, science, fiction, literature, religion, and many others. As one of the part of book categories, the nature of statistical learning theory always becomes the most wanted book. Many people are absolutely searching for this book. It means that many love to read this kind of book.

3,364 citations

Journal Article•10.1109/34.598227•
Probabilistic visual learning for object representation

[...]

Baback Moghaddam1, Alex Pentland1•
Massachusetts Institute of Technology1
01 Jul 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: An unsupervised technique for visual learning is presented, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition and is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects.
Abstract: We present an unsupervised technique for visual learning, which is based on density estimation in high-dimensional spaces using an eigenspace decomposition. Two types of density estimates are derived for modeling the training data: a multivariate Gaussian (for unimodal distributions) and a mixture-of-Gaussians model (for multimodal distributions). Those probability densities are then used to formulate a maximum-likelihood estimation framework for visual search and target detection for automatic object recognition and coding. Our learning technique is applied to the probabilistic visual modeling, detection, recognition, and coding of human faces and nonrigid objects, such as hands.

1,694 citations

Proceedings Article•
A Framework for Multiple-Instance Learning

[...]

Oded Maron1, Tomás Lozano-Pérez1•
Massachusetts Institute of Technology1
1 Dec 1997
TL;DR: A new general framework, called Diverse Density, is described, which is applied to learn a simple description of a person from a series of images containing that person, to a stock selection problem, and to the drug activity prediction problem.
Abstract: Multiple-instance learning is a variation on supervised learning, where the task is to learn a concept given positive and negative bags of instances. Each bag may contain many instances, but a bag is labeled positive even if only one of the instances in it falls within the concept. A bag is labeled negative only if all the instances in it are negative. We describe a new general framework, called Diverse Density, for solving multiple-instance learning problems. We apply this framework to learn a simple description of a person from a series of images (bags) containing that person, to a stock selection problem, and to the drug activity prediction problem.

1,544 citations

Proceedings Article•
Reinforcement Learning with Hierarchies of Machines

[...]

Ronald Parr1, Stuart Russell1•
University of California, Berkeley1
1 Dec 1997
TL;DR: This work presents provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrates their effectiveness on a problem with several thousand states.
Abstract: We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. This allows for the use of prior knowledge to reduce the search space and provides a framework in which knowledge can be transferred across problems and in which component solutions can be recombined to solve larger and more complicated problems. Our approach can be seen as providing a link between reinforcement learning and "behavior-based" or "teleo-reactive" approaches to control. We present provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrate their effectiveness on a problem with several thousand states.

862 citations

Journal Article•10.1023/A:1008819414322•
Reinforcement learning in the multi-robot domain

[...]

Maja J. Matarić1•
Brandeis University1
27 Mar 1997-Autonomous Robots
TL;DR: A formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain and experimentally validate the approach on a group of four mobile robots learning a foraging task.
Abstract: This paper describes a formulation of reinforcement learning that enables learning in noisy, dynamic environments such as in the complex concurrent multi-robot learning domain. The methodology involves minimizing the learning space through the use of behaviors and conditions, and dealing with the credit assignment problem through shaped reinforcement in the form of heterogeneous reinforcement functions and progress estimators. We experimentally validate the approach on a group of four mobile robots learning a foraging task.

513 citations

Journal Article•10.1016/S0925-2312(97)00045-3•
The nonlinear PCA learning rule in independent component analysis

[...]

Erkki Oja1•
Helsinki University of Technology1
30 Sep 1997-Neurocomputing
TL;DR: It has been verified experimentally that when nonlinear Principal Component Analysis (PCA) learning rules are used for the weights of a neural layer, the neurons have signal separation capabilities and can be used for image and speech signal separation.

249 citations

Journal Article•10.1145/263479.263481•
A multilevel approach to intelligent information filtering: model, system, and evaluation

[...]

Javed Mostafa1, Snehasis Mukhopadhyay2, Mathew J. Palakal2, Wai Lam3•
Indiana University1, Purdue University2, The Chinese University of Hong Kong3
01 Oct 1997-ACM Transactions on Information Systems
TL;DR: A filtering model is proposed that decomposes the overall task into subsystem functionalities and highlights the need for multiple adaptation techniques to cope with uncertainties.
Abstract: In information-filtering environments, uncertainties associated with changing interests of the user and the dynamic document stream must be handled efficiently. In this article, a filtering model is proposed that decomposes the overall task into subsystem functionalities and highlights the need for multiple adaptation techniques to cope with uncertainties. A filtering system, SIFTER, has been implemented based on the model, using established techniques in information retrieval and artificial intelligence. These techniques include document representation by a vector-space model, document classification by unsupervised learning, and user modeling by reinforcement learning. The system can filter information based on content and a user's specific interests. The user's interests are automatically learned with only limited user intervention in the form of optional relevance feedback for documents. We also describe experimental studies conducted with SIFTER to filter computer and information science documents collected from the Internet and commercial database services. The experimental results demonstrate that the system performs very well in filtering documents in a realistic problem setting.

220 citations

Proceedings Article•10.1109/TAI.1997.632300•
Dimensionality reduction of unsupervised data

[...]

Manoranjan Dash1, Huan Liu1, Jun Yao1•
National University of Singapore1
3 Nov 1997
TL;DR: This paper proposes an entropy measure for ranking features, and conducts extensive experiments to show that the method is able to find the important features and compares well with a similar feature ranking method that requires class information unlike this method.
Abstract: Dimensionality reduction is an important problem for efficient handling of large databases. Many feature selection methods exist for supervised data having class information. Little work has been done for dimensionality reduction of unsupervised data in which class information is not available. Principal component analysis (PCA) is often used. However, PCA creates new features. It is difficult to obtain intuitive understanding of the data using the new features only. We are concerned with the problem of determining and choosing the important original features for unsupervised data. Our method is based on the observation that removing an irrelevant feature from the feature set may not change the underlying concept of the data, but not so otherwise. We propose an entropy measure for ranking features, and conduct extensive experiments to show that our method is able to find the important features. Also it compares well with a similar feature ranking method (Relief) that requires class information unlike our method.

202 citations

Book Chapter•10.1007/978-94-011-5014-9_18•
An information-theoretic analysis of hard and soft assignment methods for clustering

[...]

Michael Kearns1, Yishay Mansour2, Andrew Y. Ng3•
AT&T Labs1, Tel Aviv University2, Carnegie Mellon University3
1 Aug 1997
TL;DR: A simple decomposition of the expected distortion is shown, showing that K-means (and its extension for inferring general parametric densities from unlabeled sample data) must implicitly manage a trade-off between how similar the data assigned to each cluster are, and how the data are balanced among the clusters.
Abstract: Assignment methods are at the heart of many algorithms for unsupervised learning and clustering -- in particular, the well-known K-means and Expectation-Maximizatian (EM) algorithms. In this work, we study several different methods of assignment, including the "hard" assignments used by K-means and the "soft" assignments used by EM. While it is known that K-means minimizes the distortion on the data and EM maximizes the likelihood, little is known about the systematic differences of behavior between the two algorithms. Here we shed light on these differences via an information-theoretic analysis. The cornerstone of our results is a simple decomposition of the expected distortion, showing that K-means (and its extension for inferring general parametric densities from unlabeled sample data) must implicitly manage a trade-off between how similar the data assigned to each cluster are, and how the data are balanced among the clusters. How well the data are balanced is measured by the entropy of the partition defined by the hard assignments. In addition to letting us predict and verify systematic differences between K-means and EM on specific examples, the decomposition allows us to give a rather general argument showing that K-means will consistently find densities with less "overlap" than EM. We also study a third natural assignment method that we call posterior assignment, that is close in spirit to the soft assignments of EM, but leads to a surprisingly different algorithm.

172 citations

Journal Article•10.1023/A:1009735908398•
Mathematical Programming in Data Mining

[...]

Olvi L. Mangasarian1•
University of Wisconsin-Madison1
21 Jan 1997-Data Mining and Knowledge Discovery
TL;DR: A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors and is utilized to discover very useful survival curves for breast cancer patients from a medical database.
Abstract: Mathematical programming approaches to three fundamental problems will be described: feature selection, clustering and robust representation. The feature selection problem considered is that of discriminating between two sets while recognizing irrelevant and redundant features and suppressing them. This creates a lean model that often generalizes better to new unseen data. Computational results on real data confirm improved generalization of leaner models. Clustering is exemplified by the unsupervised learning of patterns and clusters that may exist in a given database and is a useful tool for knowledge discovery in databases (KDD). A mathematical programming formulation of this problem is proposed that is theoretically justifiable and computationally implementable in a finite number of steps. A resulting k-Median Algorithm is utilized to discover very useful survival curves for breast cancer patients from a medical database. Robust representation is concerned with minimizing trained model degradation when applied to new problems. A novel approach is proposed that purposely tolerates a small error in the training process in order to avoid overfitting data that may contain errors. Examples of applications of these concepts are given.

148 citations

Proceedings Article•
Distinguishing Word Senses in Untagged Text

[...]

Ted Pedersen, Rebecca Bruce
1 Jan 1997
TL;DR: This article presented three unsupervised learning algorithms that are able to distinguish among the known senses (i.e., as defined in some dictionary) of a word, based only on features that can be automatically extracted from untagged text.
Abstract: This paper describes an experimental comparison of three unsupervised learning algorithms that distinguish the sense of an ambiguous word in untagged text. The methods described in this paper, McQuitty's similarity analysis, Ward's minimum-variance method, and the EM algorithm, assign each instance of an ambiguous word to a known sense definition based solely on the values of automatically identifiable features in text. These methods and feature sets are found to be more successful in disambiguating nouns rather than adjectives or verbs. Overall, the most accurate of these procedures is McQuitty's similarity analysis in combination with a high dimensional feature set. 1 I n t r o d u c t i o n Statistical methods for natural language processing are often dependent on the availability of costly knowledge sources such as manually annotated text or semantic networks. This limits the applicability of such approaches to domains where this hard to acquire knowledge is already available. This paper presents three unsupervised learning algorithms that are able to distinguish among the known senses (i.e., as defined in some dictionary) of a word, based only on features that can be automatically extracted from untagged text. The object of unsupervised learning is to determine the class membership of each observation (i.e. each object to be classified), in a sample without using training examples of correct classifications. We discuss three algorithms, McQuitty's similarity analysis (McQuitty, 1966), Ward's minimum-variance method (Ward, 1963) and the EM algorithm (Dempster, Laird, and Rubin, 1977), that can be used to distinguish among the known senses of an ambiguous word without the aid of disambiguated examples. The EM algorithm produces maximum likelihood estimates of the parameters of a probabilistic model, where that model has been specified in advance. Both Ward's and McQuitty's methods are agglomerative clustering algorithms that form classes of unlabeled observations that minimize their respective distance measures between class members. The rest of this paper is organized as follows. First, we present introductions to Ward's and McQuitty 's methods (Section 2) and the EM algorithm (Section 3). We discuss the thirteen words (Section 4) and the three feature sets (Section 5) used in our experiments. We present our experimental results (Section 6) and close with a discussion of related work (Section 7). 2 Agglomerat ive Clustering In general, clustering methods rely on the assumption that classes occupy distinct regions in the feature space. The distance between two points in a multi-dimensional space can be measured using any of a wide variety of metrics (see, e.g. (Devijver and Kittler, 1982)). Observations are grouped in the manner that minimizes the distance between the members of each class. Ward's and McQuitty's method are agglomerative clustering algorithms that differ primarily in how they compute the distance between clusters. All such algorithms begin by placing each observation in a unique cluster, i.e. a cluster of one. The two closest clusters are merged to form a new cluster that replaces the two merged clusters. Merging of the two closest clusters continues until only some specified number of clusters remain. However, our data does not immediately lend itself to a distance-based interpretation. Our features represent part-of-speech (POS) tags, morphological characteristics, and word co-occurrence; such features are nominal and their values do not have scale. Given a POS feature, for example, we could choose noun = 1, verb = 2, adjective = 3, and adverb = 4. That adverb is represented by a larger number than noun is purely coincidental and implies nothing about the relationship between nouns and adverbs. Thus, before we employ either clustering algo-
Journal Article•10.1023/A:1007365809034•
Tracking Context Changes through Meta-Learning

[...]

Gerhard Widmer1•
Austrian Research Institute for Artificial Intelligence1
01 Jun 1997-Machine Learning
TL;DR: A general two-level learning model is presented that effectively adjusts to changing contexts by trying to detect (via ‘meta-learning’) contextual clues and using this information to focus the learning process.
Abstract: The article deals with the problem of learning incrementally (‘on-line’) in domains where the target concepts are context-dependent, so that changes in context can produce more or less radical changes in the associated concepts. In particular, we concentrate on a class of learning tasks where the domain provides explicit clues as to the current context (e.g., attributes with characteristic values). A general two-level learning model is presented that effectively adjusts to changing contexts by trying to detect (via ‘meta-learning’) contextual clues and using this information to focus the learning process. Context learning and detection occur during regular on-line learning, without separate training phases for context recognition. Two operational systems based on this model are presented that differ in the underlying learning algorithm and in the way they use contextual information: METAL(B) combines meta-learning with a Bayesian classifier, while METAL(IB) is based on an instance-based learning algorithm. Experiments with synthetic domains as well as a number of ‘real-world” problems show that the algorithms are robust in a variety of dimensions, and that meta-learning can produce substantial increases in accuracy over simple object-level learning in situations with changing contexts.
Book•
Reinforcement Learning and Distributed Local Model Synthesis

[...]

Tomas Landelius
1 Jan 1997
TL;DR: It is argued that local models have the potential to help solving problems in high-dimensional spaces and that global models have not and a linear approximation of the system dynamics and a quadratic function describing the long term reward are suggested to constitute a suitable local model.
Abstract: Reinforcement learning is a general and powerful way to formulate complex learning problems and acquire good system behaviour. The goal of a reinforcement learning system is to maximize a long term ...
Proceedings Article•
Learning Continuous Attractors in Recurrent Networks

[...]

H. Sebastian Seung1•
Alcatel-Lucent1
1 Dec 1997
TL;DR: If an object has a continuous family of instantiations, it should be represented by a continuous attractor, and this idea is illustrated with a network that learns to complete patterns.
Abstract: One approach to invariant object recognition employs a recurrent neural network as an associative memory. In the standard depiction of the network's state space, memories of objects are stored as attractive fixed points of the dynamics. I argue for a modification of this picture: if an object has a continuous family of instantiations, it should be represented by a continuous attractor. This idea is illustrated with a network that learns to complete patterns. To perform the task of filling in missing information, the network develops a continuous attractor that models the manifold from which the patterns are drawn. From a statistical view-point, the pattern completion task allows a formulation of unsupervised learning in terms of regression rather than density estimation.
Proceedings Article•10.1109/ICASSP.1997.599569•
Applications of neural blind separation to signal and image processing

[...]

Juha Karhunen1, Aapo Hyvärinen1, Ricardo Vigário, Jarmo Hurri1, Erkki Oja1 •
Helsinki University of Technology1
21 Apr 1997
TL;DR: Neural blind separation techniques developed in the laboratory are applied to the extraction of features from natural images and to the separation of medical EEG signals, which yields features that describe the underlying data better than for example classical principal component analysis.
Abstract: In blind source separation one tries to separate statistically independent unknown source signals from their linear mixtures without knowing the mixing coefficients. Such techniques are currently studied actively both in statistical signal processing and unsupervised neural learning. We apply neural blind separation techniques developed in our laboratory to the extraction of features from natural images and to the separation of medical EEG signals. The new analysis method yields features that describe the underlying data better than for example classical principal component analysis. We discuss difficulties related with real-world applications of blind signal processing, too.
Journal Article•10.1016/S0167-739X(97)00017-4•
Clustering techniques

[...]

Pierre Michaud
1 Nov 1997
TL;DR: The clustering problem, also known as unsupervised learning, is the problem of partitioning a population into clusters (or classes).
Journal Article•10.1023/A:1007355226281•
Explanation-Based Learning and Reinforcement Learning: A Unified View

[...]

Thomas G. Dietterich1, Nicholas S. Flann2•
Oregon State University1, Utah State University2
01 Sep 1997-Machine Learning
TL;DR: This paper shows how to develop dynamic programming versions of EBL, which it is called region-based dynamic programming or Explanation-Based Reinforcement Learning (EBRL), and compares batch and online versions of EBRL to batch andOnline versions of point-basedynamic programming and to standard EBL.
Abstract: In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of propagating information backward from the goal toward the starting state. Most RL methods perform this propagation on a state-by-state basis, while EBL methods compute the weakest preconditions of operators, and hence, perform this propagation on a region-by-region basis. Barto, Bradtke, and Singh (1995) have observed that many algorithms for reinforcement learning can be viewed as asynchronous dynamic programming. Based on this observation, this paper shows how to develop dynamic programming versions of EBL, which we call region-based dynamic programming or Explanation-Based Reinforcement Learning (EBRL). The paper compares batch and online versions of EBRL to batch and online versions of point-based dynamic programming and to standard EBL. The results show that region-based dynamic programming combines the strengths of EBL (fast learning and the ability to scale to large state spaces) with the strengths of reinforcement learning algorithms (learning of optimal policies). Results are shown in chess endgames and in synthetic maze tasks.
Journal Article•10.1109/72.572091•
A methodology for constructing fuzzy algorithms for learning vector quantization

[...]

Nicolaos B. Karayiannis1•
University of Houston1
01 May 1997-IEEE Transactions on Neural Networks
TL;DR: Two quantitative measures are introduced which establish a relationship between the formulation that led to FALVQ algorithms and the competition between the prototypes during the learning process and are tested and evaluated using the IRIS data set.
Abstract: This paper presents a general methodology for the development of fuzzy algorithms for learning vector quantization (FALVQ). The design of specific FALVQ algorithms according to existing approaches reduces to the selection of the membership function assigned to the weight vectors of an LVQ competitive neural network, which represent the prototypes. The development of a broad variety of FALVQ algorithms can be accomplished by selecting the form of the interference function that determines the effect of the nonwinning prototypes on the attraction between the winning prototype and the input of the network. The proposed methodology provides the basis for extending the existing FALVQ 1, FALVQ 2, and FALVQ 3 families of algorithms. This paper also introduces two quantitative measures which establish a relationship between the formulation that led to FALVQ algorithms and the competition between the prototypes during the learning process. The proposed algorithms and competition measures are tested and evaluated using the IRIS data set. The significance of the proposed competition measure is illustrated using FALVQ algorithms to perform segmentation of magnetic resonance images of the brain.
Journal Article•10.1109/25.554747•
Environment-adaptation mobile radio propagation prediction using radial basis function neural networks

[...]

Po-Rong Chang1, Wen-Hao Yang•
National Chiao Tung University1
01 Feb 1997-IEEE Transactions on Vehicular Technology
TL;DR: This paper investigates the application of a radial basis function (RBF) neural network to the prediction of field strength based on topographical and morphographical data and finds a hybrid algorithm that significantly enhances the real-time or adaptive capability of the RBF-based prediction model.
Abstract: This paper investigates the application of a radial basis function (RBF) neural network to the prediction of field strength based on topographical and morphographical data. The RBF neural network is a two-layer localized receptive field network whose output nodes from a combination of radial activation functions computed by the hidden layer nodes. Appropriate centers and connection weights in the RBF network lead to a network that is capable of forming the best approximation to any continuous nonlinear mapping up to an arbitrary resolution. Such an approximation introduces best nonlinear approximation capability into the prediction model in order to accurately predict propagation loss over an arbitrary environment based on adaptive learning from measurement data. The adaptive learning employs hybrid competitive and recursive least squares algorithms. The unsupervised competitive algorithm adjusts the centers while the recursive least squares (RLS) algorithm estimates the connection weights. Because these two learning rules are both linear, rapid convergence is guaranteed. This hybrid algorithm significantly enhances the real-time or adaptive capability of the RBF-based prediction model. The applications to Okumura's (1968) data are included to demonstrate the effectiveness of the RBF neural network approach.
Book•
Symbolic visual learning

[...]

Katsushi Ikeuchi1, Manuela Veloso2•
University of Tokyo1, Carnegie Mellon University2
1 May 1997
TL;DR: 1. The Visual Learning Problem, 2. Multi-HASH: Learning Object Attributes and Hash Tables for Fast 3D Object Recognition, and 3. Explanation Based Learning for Mobile Robot Perception.
Abstract: 1. The Visual Learning Problem 2. MULTI-HASH: Learning Object Attributes and Hash Tables for Fast 3D Object Recognition 3. Learning Control Strategies for Object Recognition 4. PADO: A New Learning Architecture for Object Recognition 5. Learning Organization Hierarchies of Large Modelbases for Fast Recognition 6. Application of Machine Learning in Function-Based Recognition 7. Learning a Visual Model and an Image Processing Strategy from a Series of Silhouette Images on MIRACLE-IV 8. Assembly Plan from Observation 9. Visual Event Perception 10. A Knowledge Framework for Seeing and Learning 11. Explanation Based Learning for Mobile Robot Perception 12. Navigation with Landmarks: Computing Goal Locations from Place Codes
Proceedings Article•
Stacked Density Estimation

[...]

Padhraic Smyth1, David H. Wolpert2•
California Institute of Technology1, Ames Research Center2
1 Dec 1997
TL;DR: The technique of stacking, previously only used for supervised learning, is applied to unsupervised learning and used for non-parametric multivariate density estimation, to combine finite mixture model and kernel density estimators.
Abstract: In this paper, the technique of stacking, previously only used for supervised learning, is applied to unsupervised learning. Specifically, it is used for non-parametric multivariate density estimation, to combine finite mixture model and kernel density estimators. Experimental results on both simulated data and real world data sets clearly demonstrate that stacked density estimation outperforms other strategies such as choosing the single best model based on cross-validation, combining with uniform weights, and even the single best model chosen by "cheating" by looking at the data used for independent testing.
Journal Article•10.1109/78.650113•
Multiresolution learning paradigm and signal prediction

[...]

Yao Liang1, E.W. Page2•
Clemson University1, Alcatel-Lucent2
01 Nov 1997-IEEE Transactions on Signal Processing
TL;DR: A new learning concept and paradigm for neural networks, called multiresolution learning, is presented, based onMultiresolution analysis in wavelet theory, which can significantly improve the generalization performance of neural networks.
Abstract: Current neural network learning processes, regardless of the learning algorithm and preprocessing used, are sometimes inadequate for difficult problems. We present a new learning concept and paradigm for neural networks, called multiresolution learning, based on multiresolution analysis in wavelet theory. The multiresolution learning paradigm can significantly improve the generalization performance of neural networks.
Journal Article•10.1016/S0893-6080(97)00005-1•
Efficient partition of learning data sets for neural network training

[...]

Igor V. Tetko1, Alessandro E. P. Villa1•
University of Lausanne1
01 Nov 1997-Neural Networks
TL;DR: This study investigates the emerging possibilities of combining unsupervised and supervised learning in neural network ensembles using an efficient partition of a noisy input data set in order to focus the training of neural networks on the most complex and informative domains of the data set and accelerate the learning phase.
Proceedings Article•
Beyond concise and colorful: learning intelligible rules

[...]

Michael J. Pazzani1, Subramani Mani1, W.Rodman Shankle1•
University of California, Irvine1
14 Aug 1997
TL;DR: It is shown that one factor that influences the intelligibility of learned models is consistency with existing knowledge and a learning algorithm is described that creates concepts with this goal in mind.
Abstract: A variety of techniques from statistics, signal processing, pattern recognition, machine learning, and neural networks have been proposed to understand data by discovering useful categories. However, research in data mining has not paid attention to the cognitive factors that make learned categories intelligible to human users. We show that one factor that influences the intelligibility of learned models is consistency with existing knowledge and describe a learning algorithm that creates concepts with this goal in mind.
Journal Article•10.1162/NECO.1997.9.4.895•
Activation functions, computational goals, and learning rules for local processors with contextual guidance

[...]

Jim Kay1, William A. Phillips2•
University of Glasgow1, University of Stirling2
15 May 1997-Neural Computation
TL;DR: The basic capabilities of a local processor with two distinct classes of inputs: receptive field inputs that provide the primary drive and contextual inputs that modulate their effects are studied.
Abstract: Information about context can enable local processors to discover latent variables that are relevant to the context within which they occur, and it can also guide short-term processing. For example, Becker and Hinton (1992) have shown how context can guide learning, and Hummel and Biederman (1992) have shown how it can guide processing in a large neural net for object recognition. This article studies the basic capabilities of a local processor with two distinct classes of inputs: receptive field inputs that provide the primary drive and contextual inputs that modulate their effects. The contextual predictions are used to guide processing without confusing them with receptive field inputs. The processor's transfer function must therefore distinguish these two roles. Given these two classes of input, the information in the output can be decomposed into four disjoint components to provide a space of possible goals in which the unsupervised learning of Linsker (1988) and the internally supervised learning of...
Proceedings Article•10.1049/CP:19970429•
Fraud detection and management in mobile telecommunications networks

[...]

Peter Burge1, John Shawe-Taylor1, C Cooke1, Yves Moreau1, B. Preneel1, C Stoermann1 •
University of London1
28 Apr 1997
TL;DR: The paper discusses the status of research on detection of fraud undertaken as part of the European Commission-funded ACTS ASPeCT (Advanced security for personal communications technologies) project, and explores the detection of fraudulent behaviour based on a combination of absolute and differential usage.
Abstract: The paper discusses the status of research on detection of fraud undertaken as part of the European Commission-funded ACTS ASPeCT (Advanced security for personal communications technologies) project. A first task has been the identification of possible fraud scenarios and of typical fraud indicators which can be mapped to data in toll tickets. Currently, the project is exploring the detection of fraudulent behaviour based on a combination of absolute and differential usage. Three approaches are being investigated: a rule based approach and two approaches based on neural networks, where both supervised and unsupervised learning are considered. Special attention is being paid to the feasibility of the implementations.
Journal Article•10.1162/NECO.1997.9.4.883•
Optimal, unsupervised learning in invariant object recognition

[...]

Guy Wallis1, Roland J. Baddeley2•
Max Planck Society1, University of Oxford2
15 May 1997-Neural Computation
TL;DR: Simulations of a competitive network trained on a character recognition task are used to highlight the success of an optimal linear learning rule in relation to simple Hebbian learning and to show that the theory can give accurate quantitative predictions for the optimal parameters for such networks.
Abstract: A means for establishing transformation-invariant representations of objects is proposed and analyzed, in which different views are associated on the basis of the temporal order of the presentation of these views, as well as their spatial similarity. Assuming knowledge of the distribution of presentation times, an optimal linear learning rule is derived. Simulations of a competitive network trained on a character recognition task are then used to highlight the success of this learning rule in relation to simple Hebbian learning and to show that the theory can give accurate quantitative predictions for the optimal parameters for such networks.
Patent•
System and method for combining multiple learning agents to produce a prediction method

[...]

Lawrence E. Hunter
23 May 1997
TL;DR: In this paper, a method for improving the performance of learning agents such as neural networks, genetic algorithms and decision trees that derive prediction methods from a training set of data is presented, where the input representations of the learning agents are modified by including therein a feature combination extracted from another learning agent.
Abstract: System and method for improving the performance of learning agents such as neural networks, genetic algorithms and decision trees that derive prediction methods from a training set of data. In part of the method, a population of learning agents of different classes is trained on the data set, each agent producing in response a prediction method based on the agent's input representation. Feature combinations are extracted from the prediction methods produced by the learning agents. The input representations of the learning agents are then modified by including therein a feature combination extracted from another learning agent. In another part of a method, the parameter values of the learning agents are changed to improve the accuracy of the prediction method. A fitness measure is determined for each learning agent based on the prediction method the agent produces. Parameter values of a learning agent are then selected based on the agent's fitness measure. Variation is introduced into the selected parameter values, and another learning agent of the same class is defined using the varied parameter values. The learning agents are then again trained on the data set to cause a learning agent to produce a prediction method based on the derived feature combinations and varied parameter values.
Journal Article•10.1037//0278-7393.23.3.638•
Event category learning

[...]

Alan W. Kersten1, Dorrit Billman•
Indiana University1
01 Jan 1997-Journal of Experimental Psychology: Learning, Memory and Cognition
TL;DR: This paper investigated the learning of event categories, in particular, categories of simple animated events, each involving a causal interaction between two characters, and found that correlations among attributes of events are easier to learn when they form part of a rich correlational structure than when they are independent of other correlations.
Abstract: This research investigated the learning of event categories, in particular, categories of simple animated events, each involving a causal interaction between 2 characters. Four experiments examined whether correlations among attributes of events are easier to learn when they form part of a rich correlational structure than when they are independent of other correlations. Event attributes (e.g., state change, path of motion) were chosen to reflect distinctions made by verbs. Participants were presented with an unsupervised learning task and were then tested on whether the organization of correlations affected learning. Correlations forming part of a system of correlations were found to be better learned than isolated correlations. This finding of facilitation from correlational structure is explained in terms of a model that generates internal feedback to adjust the salience of attributes. These experiments also provide evidence regarding the role of object information in events, suggesting that this role is mediated by object category representations.
Journal Article•10.1016/S0360-8352(96)00310-5•
An unsupervised learning neural algorithm for identifying process behavior on control charts and a comparison with supervised learning approaches

[...]

Amjed M. Al-Ghanim1•
An-Najah National University1
01 Jul 1997-Computers & Industrial Engineering
TL;DR: A new approach to detect and identify unnatural patterns on control charts based on the unsupervised self-organizing neural paradigm based on ART1 networks is presented.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve