Top 289 papers published in the topic of Unsupervised learning in 2000

Showing papers on "Unsupervised learning published in 2000"

Journal Article•10.1023/A:1007692713085•

Text Classification from Labeled and Unlabeled Documents using EM

[...]

Kamal Nigam¹, Andrew McCallum², Sebastian Thrun¹, Tom M. Mitchell¹•Institutions (2)

Carnegie Mellon University¹, Jordan University of Science and Technology²

01 May 2000-Machine Learning

TL;DR: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents, and presents two extensions to the algorithm that improve classification accuracy under these conditions.

...read moreread less

Abstract: This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve classification accuracy under these conditions: (1) a weighting factor to modulate the contribution of the unlabeled data, and (2) the use of multiple mixture components per class. Experimental results, obtained using text from three different real-world tasks, show that the use of unlabeled data reduces classification error by up to 30%.

...read moreread less

3,477 citations

Proceedings Article•10.1145/354756.354805•

Analyzing the effectiveness and applicability of co-training

[...]

Kamal Nigam¹, Rayid Ghani¹•Institutions (1)

Carnegie Mellon University¹

6 Nov 2000

TL;DR: It is demonstrated that when learning from labeled and unlabeled data, algorithms explicitly leveraging a natural independent split of the features outperform algorithms that do not and may out-perform algorithms not using a split.

...read moreread less

Abstract: Recently there has been signi cant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applies to datasets that have a natural separation of their features into two disjoint sets. We demonstrate that when learning from labeled and unlabeled data, algorithms explicitly leveraging a natural independent split of the features outperform algorithms that do not. When a natural split does not exist, co-training algorithms that manufacture a feature split may out-perform algorithms not using a split. These results help explain why co-training algorithms are both discriminative in nature and robust to the assumptions of their embedded classi ers.

...read moreread less

1,270 citations

Book Chapter•10.1007/3-540-45054-8_2•

Unsupervised Learning of Models for Recognition

[...]

M. Weber¹, Max Welling¹, Pietro Perona², Pietro Perona¹•Institutions (2)

California Institute of Technology¹, University of Padua²

26 Jun 2000

TL;DR: A method to learn object class models from unlabeled and unsegmented cluttered cluttered scenes for the purpose of visual object recognition that achieves very good classification results on human faces and rear views of cars.

...read moreread less

Abstract: We present a method to learn object class models from unlabeled and unsegmented cluttered scenes for the purpose of visual object recognition. We focus on a particular type of model where objects are represented as flexible constellations of rigid parts (features). The variability within a class is represented by a joint probability density function (pdf) on the shape of the constellation and the output of part detectors. In a first stage, the method automatically identifies distinctive parts in the training set by applying a clustering algorithm to patterns selected by an interest operator. It then learns the statistical shape model using expectation maximization. The method achieves very good classification results on human faces and rear views of cars.

...read moreread less

793 citations

Proceedings Article•

Solving the Multiple-Instance Problem: A Lazy Learning Approach

[...]

Jun Wang¹, Jean-Daniel Zucker•Institutions (1)

University of Illinois at Urbana–Champaign¹

29 Jun 2000

TL;DR: This paper investigates the use of lazy learning and Hausdorff distance to approach the multiple-instance problem, and presents two variants of the K-nearest neighbor algorithm, called Bayesian-KNN and Citation- KNN, solving themultiple- instance problem.

...read moreread less

Abstract: As opposed to traditional supervised learning, multiple-instance learning concerns the problem of classifying a bag of instances, given bags that are labeled by a teacher as being overall positive or negative. Current research mainly concentrates on adapting traditional concept learning to solve this problem. In this paper we investigate the use of lazy learning and Hausdorff distance to approach the multiple-instance problem. We present two variants of the K-nearest neighbor algorithm, called Bayesian-KNN and Citation-KNN, solving the multiple-instance problem. Experiments on the Drug discovery benchmark data show that both algorithms are competitive with the best ones conceived in the concept learning framework. Further work includes exploring of a combination of lazy and eager multiple-instance problem classifiers.

...read moreread less

681 citations

Learning with Labeled and Unlabeled Data

[...]

Matthias Seeger

1 Jan 2000

TL;DR: A rigorous definition of the problem is given and the crucial role of prior knowledge is put forward, and the important notion of input-dependent regularization is discussed.

...read moreread less

Abstract: In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review We give a rigorous definition of the problem and relate it to supervised and unsupervised learning The crucial role of prior knowledge is put forward, and we discuss the important notion of input-dependent regularization We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts However, some of them might serve as basis for a genuine method In the literature review, we try to cover the wide variety of (recent) work and to classify this work into meaningful categories We also mention work done on related problems and suggest some ideas towards synthesis Finally, we discuss some caveats and tradeoffs of central importance to the problem

...read moreread less

589 citations

Proceedings Article•10.1145/347090.347160•

On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms

[...]

Kenji Yamanishi¹, Jun'ichi Takeuchi¹, Graham J. Williams², Peter A. Milne²•Institutions (2)

NEC¹, Commonwealth Scientific and Industrial Research Organisation²

1 Aug 2000

TL;DR: An experimental application to network intrusion detection shows that SmartSifter was able to identify data with high scores that corresponded to attacks, with low computational costs.

...read moreread less

Abstract: Outlier detection is a fundamental issue in data mining, specifically in fraud detection, network intrusion detection, network monitoring, etc. SmartSifter is an outlier detection engine addressing this problem from the viewpoint of statistical learning theory. This paper provides a theoretical basis for SmartSifter and empirically demonstrates its effectiveness. SmartSifter detects outliers in an on-line process through the on-line unsupervised learning of a probabilistic model (using a finite mixture model) of the information source. Each time a datum is input SmartSifter employs an on-line discounting learning algorithm to learn the probabilistic model. A score is given to the datum based on the learned model with a high score indicating a high possibility of being a statistical outlier. The novel features of SmartSifter are: (1) it is adaptive to non-stationary sources of data; (2) a score has a clear statistical/information-theoretic meaning; (3) it is computationally inexpensive; and (4) it can handle both categorical and continuous variables. An experimental application to network intrusion detection shows that SmartSifter was able to identify data with high scores that corresponded to attacks, with low computational costs. Further experimental application has identified a number of meaningful rare cases in actual health insurance pathology data from Australia's Health Insurance Commission.

...read moreread less

489 citations

Journal Article•10.3758/BF03200258•

An elemental model of associative learning: I. Latent inhibition and perceptual learning

[...]

Ian P. L. McLaren¹, N. J. Mackintosh¹•Institutions (1)

University of Cambridge¹

01 Sep 2000-Animal Learning & Behavior

TL;DR: The model is applied in outline fashion to some of the basic phenomena of simple conditioning and, in greater detail, to the phenomena of latent inhibition and perceptual learning.

...read moreread less

Abstract: This paper presents a brief, informal outline followed by a formal statement of an elemental associative learning model first described by McLaren, Kaye, and Mackintosh (1989). The model assumes representation of stimuli by sets of elements (i.e., microfeatures) and a set of associative algorithms that incorporate the following: real-time simulation of learning; an error-correcting learning rule; weight decay that distinguishes between transient and permanent associations; and modulation of associative learning that gives high salience to and, hence, promotes rapid learning with novel, unpredicted stimuli and reduces the salience for a stimulus as its error term declines. The model is applied in outline fashion to some of the basic phenomena of simple conditioning and, in greater detail, to the phenomena of latent inhibition and perceptual learning. A detailed account of generalization and discrimination will be provided in a later paper.

...read moreread less

471 citations

Improving the Rprop Learning Algorithm

[...]

Christian Igel, Michael Hüsken¹•Institutions (1)

Ruhr University Bochum¹

1 Jan 2000

TL;DR: Modifications of the Rprop algorithm are introduced that improve its learning speed and the resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.

...read moreread less

Abstract: The Rprop algorithm proposed by Riedmiller and Braun is one of the best performing first-order learning methods for neural networks. We introduce modifications of the algorithm that improve its learning speed. The resulting speedup is experimentally shown for a set of neural network learning tasks as well as for artificial error surfaces.

...read moreread less

466 citations

Introduction to statistical learning theory and support vector machines

[...]

Zhang Xuegong

1 Jan 2000

TL;DR: A new framework for the general learning problem, and a novel powerful learning method called Support Vector Machine or SVM, which can solve small sample learning problems better are introduced.

...read moreread less

Abstract: Data based machine learning covers a wide range of topics from pattern recognition to function regression and density estimation Most of the existing methods are based on traditional statistics, which provides conclusion only for the situation where sample size is tending to infinity So they may not work in practical cases of limited samples Statistical Learning Theory or SLT is a small sample statistics by Vapnik et al, which concerns mainly the statistic principles when samples are limited, especially the properties of learning procedure in such cases SLT provides us a new framework for the general learning problem, and a novel powerful learning method called Support Vector Machine or SVM, which can solve small sample learning problems better It is believed that the study of SLT and SVM is becoming a new hot area in the field of machine learning This review introduces the basic ideas of SLT and SVM, their major characteristics and some current research trends

...read moreread less

434 citations

Proceedings Article•

Support Vector Machine Active Learning with Application sto Text Classification

[...]

Simon Tong, Daphne Koller

29 Jun 2000

405 citations

Journal Article•10.1109/34.841759•

Learning and design of principal curves

[...]

Balázs Kégl¹, Adam Krzyżak², Tamas Linder¹, Kenneth Zeger³•Institutions (3)

Queen's University¹, Concordia University², University of California, San Diego³

01 Mar 2000-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work defines principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution, making it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction.

...read moreread less

Abstract: Principal curves have been defined as "self-consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution or data cloud. They give a summary of the data and also serve as an efficient feature extraction tool. We take a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition makes it possible to theoretically analyze principal curve learning from training data and it also leads to a new practical construction. Our theoretical learning scheme chooses a curve from a class of polygonal lines with k segments and with a given total length to minimize the average squared distance over n training points drawn independently. Convergence properties of this learning scheme are analyzed and a practical version of this theoretical algorithm is implemented. In each iteration of the algorithm, a new vertex is added to the polygonal line and the positions of the vertices are updated so that they minimize a penalized squared distance criterion. Simulation results demonstrate that the new algorithm compares favorably with previous methods, both in terms of performance and computational complexity, and is more robust to varying data models.

...read moreread less

Proceedings Article•

Meta-Learning by Landmarking Various Learning Algorithms

[...]

Bernhard Pfahringer, Hilan Bensusan, Christophe Giraud-Carrier

29 Jun 2000

TL;DR: Experiments show that landmarking selects, with moderate but reasonable level of success, the best performing of a set of learning algorithms.

...read moreread less

Abstract: Landmarking is a novel approach to describing tasks in meta-learning. Previous approaches to meta-learning mostly considered only statistics-inspired measures of the data as a source for the definition of metaattributes. Contrary to such approaches, landmarking tries to determine the location of a specific learning problem in the space of all learning problems by directly measuring the performance of some simple and efficient learning algorithms themselves. In the experiments reported we show how such a use of landmark values can help to distinguish between areas of the learning space favouring different learners. Experiments, both with artificial and real-world databases, show that landmarking selects, with moderate but reasonable level of success, the best performing of a set of learning algorithms.

...read moreread less

Proceedings Article•10.1145/347090.347169•

Feature selection in unsupervised learning via evolutionary search

[...]

YeongSeog Kim¹, W. Nick Street¹, Filippo Menczer¹•Institutions (1)

University of Iowa¹

1 Aug 2000

TL;DR: ELSA is used, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multidimensional objectiv espace and shows promise in identifying the right features and the correct number of clusters.

...read moreread less

Abstract: Feature subset selection is an important problem in knowledge discovery, not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalabilit y, and possibly , accuracy of the resulting models. In this paper w e consider the problem of feature selection for unsupervised learning. A number of heuristic criteria can be used to estimate the quality of clusters built from a giv en featuresubset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multidimensional objectiv espace. Eac hevolved solution represents a feature subset and a number of clusters; a standard K-means algorithm is applied to form the given n umber of clusters based on the selected features. Preliminary results on both real and synthetic data show promise in nding P areto-optimal solutions through which we can identify the signi cant features and the correct number of clusters.

...read moreread less

Proceedings Article•10.1109/CVPR.2000.854754•

Towards automatic discovery of object categories

[...]

M. Weber¹, Max Welling¹, Pietro Perona²•Institutions (2)

California Institute of Technology¹, University of Padua²

15 Jun 2000

TL;DR: A method to learn heterogeneous models of object classes for visual recognition that automatically identifies distinctive features in the training set and learns the set of model parameters using expectation maximization.

...read moreread less

Abstract: We propose a method to learn heterogeneous models of object classes for visual recognition. The training images contain a preponderance of clutter and learning is unsupervised. Our models represent objects as probabilistic constellations of rigid parts (features). The variability within a class is represented by a join probability density function on the shape of the constellation and the appearance of the parts. Our method automatically identifies distinctive features in the training set. The set of model parameters is then learned using expectation maximization. When trained on different, unlabeled and unsegmented views of a class of objects, each component of the mixture model can adapt to represent a subset of the views. Similarly, different component models can also "specialize" on sub-classes of an object class. Experiments on images of human heads, leaves from different species of trees, and motor-cars demonstrate that the method works well over a wide variety of objects.

...read moreread less

Journal Article•10.1109/83.847834•

Sonar image segmentation using an unsupervised hierarchical MRF model

[...]

Max Mignotte¹, Christophe Collet¹, Patrick Pérez², Patrick Bouthemy²•Institutions (2)

École Navale¹, French Institute for Research in Computer Science and Automation²

01 Jul 2000-IEEE Transactions on Image Processing

TL;DR: A new method of segmentation, called the scale causal multigrid (SCM) algorithm, has been successfully applied to real sonar images and seems to be well suited to the segmentation of very noisy images.

...read moreread less

Abstract: This paper is concerned with hierarchical Markov random field (MRP) models and their application to sonar image segmentation. We present an original hierarchical segmentation procedure devoted to images given by a high-resolution sonar. The sonar image is segmented into two kinds of regions: shadow (corresponding to a lack of acoustic reverberation behind each object lying on the sea-bed) and sea-bottom reverberation. The proposed unsupervised scheme takes into account the variety of the laws in the distribution mixture of a sonar image, and it estimates both the parameters of noise distributions and the parameters of the Markovian prior. For the estimation step, we use an iterative technique which combines a maximum likelihood approach (for noise model parameters) with a least-squares method (for MRF-based prior). In order to model more precisely the local and global characteristics of image content at different scales, we introduce a hierarchical model involving a pyramidal label field. It combines coarse-to-fine causal interactions with a spatial neighborhood structure. This new method of segmentation, called the scale causal multigrid (SCM) algorithm, has been successfully applied to real sonar images and seems to be well suited to the segmentation of very noisy images. The experiments reported in this paper demonstrate that the discussed method performs better than other hierarchical schemes for sonar image segmentation.

...read moreread less

Proceedings Article•

Acquisition of Stand-up Behavior by a Real Robot using Hierarchical Reinforcement Learning

[...]

Jun Morimoto¹, Kenji Doya¹•Institutions (1)

Nara Institute of Science and Technology¹

29 Jun 2000

TL;DR: In this paper, a hierarchical reinforcement learning architecture is proposed to learn a discrete sequence of sub-goals in a low-dimensional state space for achieving the main goal of the task.

...read moreread less

Abstract: In this paper, we propose a hierarchical reinforcement learning architecture that realizes practical learning speed in real hardware control tasks. In order to enable learning in a practical number of trials, we introduce a low-dimensional representation of the state of the robot for higher-level planning. The upper level learns a discrete sequence of sub-goals in a low-dimensional state space for achieving the main goal of the task. The lower-level modules learn local trajectories in the original high-dimensional state space to achieve the sub-goal specified by the upper level. We applied the hierarchical architecture to a three-link, two-joint robot for the task of learning to stand up by trial and error. The upper-level learning was implemented by Q-learning, while the lower-level learning was implemented by a continuous actor–critic method. The robot successfully learned to stand up within 750 trials in simulation and then in an additional 170 trials using real hardware. The effects of the setting of the search steps in the upper level and the use of a supplementary reward for achieving sub-goals are also tested in simulation. © 2001 Elsevier Science B.V. All rights reserved.

...read moreread less

Proceedings Article•

Algorithm Selection using Reinforcement Learning

[...]

Michail G. Lagoudakis¹, Michael L. Littman²•Institutions (2)

Duke University¹, AT&T²

29 Jun 2000

TL;DR: A kind of MDP that models the algorithm selection problem by allowing multiple state transitions is introduced, and the well known Q-learning algorithm is adapted for this case in a way that combines both Monte-Carlo and Temporal Difference methods.

...read moreread less

Abstract: Many computational problems can be solved by multiple algorithms, with different algorithms fastest for different problem sizes, input distributions, and hardware characteristics. We consider the problem of algorithm selection: dynamically choose an algorithm to attack an instance of a problem with the goal of minimizing the overall execution time. We formulate the problem as a kind of Markov decision process (MDP), and use ideas from reinforcement learning to solve it. This paper introduces a kind of MDP that models the algorithm selection problem by allowing multiple state transitions. The well known Q-learning algorithm is adapted for this case in a way that combines both Monte-Carlo and Temporal Difference methods. Also, this work uses, and extends in a way to control problems, the Least-Squares Temporal Difference algorithm (LSTD(0)) of Boyan. The experimental study focuses on the classic problems of order statistic selection and sorting. The encouraging results reveal the potential of applying learning methods to traditional computational problems.

...read moreread less

Proceedings Article•10.1145/347090.347110•

Active learning using adaptive resampling

[...]

Vijay S. Iyengar¹, Chidanand Apte¹, Tong Zhang¹•Institutions (1)

IBM¹

1 Aug 2000

TL;DR: An active learning method is presented that uses adaptive resampling in a natural way to signi cantly reduce the size of the required labeled set and generates a classi cation model that achieves the high accuracies possible with current adaptive Resampling methods.

...read moreread less

Abstract: Classi cation modeling (a.k.a. supervised learning) is an extremely useful analytical technique for developing predictive and forecasting applications. The explosive growth in data warehousing and internet usage has made large amounts of data potentially available for developing classi cation models. For example, natural language text is widely available in many forms (e.g., electronic mail, news articles, reports, and web page contents). Categorization of data is a common activity which can be automated to a large extent using supervised learning methods. Examples of this include routing of electronic mail, satellite image classi cation, and character recognition. However, these tasks require labeled data sets of su ciently high quality with adequate instances for training the predictive models. Much of the on-line data, particularly the unstructured variety (e.g., text), is unlabeled. Labeling is usually a expensive manual process done by domain experts. Active learning is an approach to solving this problem and works by identifying a subset of the data that needs to be labeled and uses this subset to generate classi cation models. We present an active learning method that uses adaptive resampling in a natural way to signi cantly reduce the size of the required labeled set and generates a classi cation model that achieves the high accuracies possible with current adaptive resampling methods.

...read moreread less

Book Chapter•10.1007/978-1-4471-0443-8_6•

Bayesian Non-Linear Independent Component Analysis by Multi-Layer Perceptrons

[...]

Harri Lappalainen, Antti Honkela

1 Jan 2000

TL;DR: A nonlinear extension to independent component analysis is developed that avoids problems with overlearning which would otherwise be severe in unsupervised learning with flexible nonlinear models.

...read moreread less

Abstract: In this chapter, a non-linear extension to independent component analysis is developed. The non-linear mapping from source signals to observations is modelled by a multi-layer perceptron network and the distributions of source signals are modelled by mixture-of-Gaussians. The observations are assumed to be corrupted by Gaussian noise and therefore the method is more ade quately described as non-linear independent factor analysis. The non-linear mapping, the source distributions and the noise level are estimated from the data. Bayesian approach to learning avoids problems with overlearning which would otherwise be severe in unsupervised learning with flexible non-linear models.

...read moreread less

Journal Article•10.1016/S0262-8856(00)00042-1•

Unsupervised and adaptive Gaussian skin-color model

[...]

Luis M. Bergasa¹, Manuel Mazo¹, Alfredo Gardel¹, Miguel Angel Sotelo¹, Luciano Boquete¹ - Show less +1 more•Institutions (1)

University of Alcalá¹

01 Sep 2000-Image and Vision Computing

TL;DR: A segmentation method is described for the face skin of people of any race in real time, in an adaptive and unsupervised way, based on a Gaussian model of the skin color, that will be referred to as Unsupervised and Adaptive Gaussian Skin-Color Model, UAGM.

...read moreread less

An Adaptive Self-Organizing Color Segmentation Algorithm with Application to Robust Real-time Human Hand Localization

[...]

Ying Wu¹, Qiong Liu, Thomas S. Huang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

1 Jan 2000

TL;DR: An adaptive self-organizing color segmentation algorithm and a transductive learning algorithm used to localize human hand in video sequences and color cue and motion cue are integrated in the localization system, in which motion cue is employed to focus the attention of the system.

...read moreread less

Abstract: In Proc. Asian Conf. on Computer Vision, Taiwan, 2000 This paper describes an adaptive self-organizing color segmentation algorithm and a transductive learning algorithm used to localize human hand in video sequences. The color distribution at each time frame is approximated by the proposed 1-D self-organizing map (SOM), in which schemes of growing, pruning and merging are facilitated to find an appropriate number of color cluster automatically. Due to the dynamic backgrounds and changing lighting conditions, the distribution of color over time may not be stationary. An algorithm of SOM transduction is proposed to learn the nonstationary color distribution in HSI color space by combining supervised and unsupervised learning paradigms. Color cue and motion cue are integrated in the localization system, in which motion cue is employed to focus the attention of the system. This approach is also applied to other tasks such as human face tracking and color indexing. Our localization system implemented on a SGI O2 R10000 workstation is reliable and efficient at 20-30Hz.

...read moreread less

Journal Article•10.1023/A:1008110632619•

Statistical Learning Theory: A Primer

[...]

Theodoros Evgeniou¹, Massimiliano Pontil¹, Tomaso Poggio¹•Institutions (1)

Massachusetts Institute of Technology¹

30 Jun 2000-International Journal of Computer Vision

TL;DR: The main concepts of Statistical Learning Theory are overviewed, a framework in which learning from examples can be studied in a principled way and well known as well as emerging learning techniques such as Regularization Networks and Support Vector Machines are discussed.

...read moreread less

Abstract: In this paper we first overview the main concepts of Statistical Learning Theory, a framework in which learning from examples can be studied in a principled way. We then briefly discuss well known as well as emerging learning techniques such as Regularization Networks and Support Vector Machines which can be justified in term of the same induction principle.

...read moreread less

Proceedings Article•10.1109/IGARSS.2000.860361•

Unsupervised hyperspectral image analysis using independent component analysis

[...]

Shao-Shan Chiang¹, Chein-I Chang, I.W. Ginsberg•Institutions (1)

University of Baltimore¹

30 Jun 2000

TL;DR: In this paper, an ICA-based approach is proposed for hyperspectral image analysis, which can be viewed as a random version of the commonly used linear spectral mixture analysis, in which the abundance fractions in a linear mixture model are considered to be unknown independent signal sources.

...read moreread less

Abstract: In this paper, an ICA-based approach is proposed for hyperspectral image analysis. It can be viewed as a random version of the commonly used linear spectral mixture analysis, in which the abundance fractions in a linear mixture model are considered to be unknown independent signal sources. It does not require the full rank of the separating matrix or orthogonality as most ICA methods do. More importantly, the learning algorithm is designed based on the independency of the material abundance vector rather than the independency of the separating matrix generally used to constrain the standard ICA. As a result, the learning algorithm is able to converge to non-orthogonal independent components. This is particularly useful in hyperspectral image analysis since many materials extracted from a hyperspectral image may have similar spectral signatures and may not be orthogonal. The AVIRIS experiments have demonstrated that the proposed ICA provides an effective unsupervised technique for hyperspectral image classification.

...read moreread less

Book•

Statistics and neural networks: advances at the interface

[...]

Jim Kay, D. M. Titterington

6 Apr 2000

TL;DR: Flexible discriminant and mixture models Neural networks for unsupervised learning based on information theory Radial basis function networks and statistics Robust prediction in many-parameter models and data visualisation.

...read moreread less

Abstract: Flexible discriminant and mixture models Neural networks for unsupervised learning based on information theory Radial basis function networks and statistics Robust prediction in many-parameter models Density networks Latent variable models and data visualisation Analysis of latent structure models with multidimensional latent variables Artificial neural networks and multivariate statistics

...read moreread less

A Comparative Study on Chinese Text Categorization Methods.

[...]

Ji He¹, Ah-Hwee Tan¹, Chew Lim Tan•Institutions (1)

National University of Singapore¹

1 Jan 2000

TL;DR: Comparison of three machine learning methods on Chinese text categorization reveals that all three methods produce satisfactory performance on the test corpus while ARAM exhibits a marginally better generalization capability, especially from relatively small and noisy training sets.

...read moreread less

Abstract: This paper reports our comparative evaluation of three machine learning methods on Chinese text categorization. Whereas a wide range of methods have been applied to English text categorization, relatively few studies have been done on Chinese text categorization. Based on a People's Daily news corpus, a series of controlled experiments evaluate three machine learning methods, namely k Nearest Neighbor (kNN) algorithm, Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM), in terms of their capabilities in mining categorization knowledge from high dimensional, sparse, and relatively noisy document feature vectors. Experiments reveal that all three methods produce satisfactory performance on the test corpus while ARAM exhibits a marginally better generalization capability, especially from relatively small and noisy training sets.

...read moreread less

Journal Article•10.1016/S0304-3800(00)00312-4•

Determining temporal pattern of community dynamics by using unsupervised learning algorithms

[...]

Tae-Soo Chon¹, Young-Seuk Park¹, June Ho Park¹•Institutions (1)

Pusan National University¹

30 Jul 2000-Ecological Modelling

TL;DR: Analysis of patterns of temporal variation in community dynamics was conducted by combining two unsupervised artificial neural networks, the Adaptive Resonance Theory (ART) and the Kohonen network.

...read moreread less

Dissertation•

Unsupervised learning of models for object recognition

[...]

M. Weber, Pietro Perona

1 Jan 2000

TL;DR: A method to learn object class models from unlabeled and unsegmented cluttered cluttered scenes for the purpose of visual object recognition achieves very good classification results on human faces, cars, leaves, handwritten letters, and cartoon characters.

...read moreread less

Abstract: A method is presented to learn object class models from unlabeled and unsegmented cluttered scenes for the purpose of visual object recognition. The variability across a class of objects is modeled in a principled way, treating objects as flexible constellations of rigid parts (features). Variability is represented by a joint probability density function (pdf) on the shape of the constellation and the output of part detectors. Corresponding “constellation models” can be learned in a completely unsupervised fashion. In a first stage, the learning method automatically identifies distinctive parts in the training set by applying a clustering algorithm to patterns selected by an interest operator. It then learns the statistical shape model using expectation maximization. Mixtures of constellation models can be defined and applied to “discover” object categories in an unsupervised manner. The method achieves very good classification results on human faces, cars, leaves, handwritten letters, and cartoon characters.

...read moreread less

Journal Article•10.1145/355483.355486•

A neuroidal architecture for cognitive computation

[...]

Leslie G. Valiant¹•Institutions (1)

Harvard University¹

01 Sep 2000-Journal of the ACM

TL;DR: The main claims are that the basic learning and deduction tasks are provably tractable and tractable learning offers viable approaches to a range of issues that have been previously identified as problematic for artificial intelligence systems that are programmed.

...read moreread less

Abstract: An architecture is described for designing systems that acquire and ma nipulate large amounts of unsystematized, or so-called commonsense, knowledge. Its aim is to exploit to the full those aspects of computational learning that are known to offer powerful solutions in the acquisition and maintenance of robust knowledge bases. The architecture makes explicit the requirements on the basic computational tasks that are to be performed and is designed to make this computationally tractable even for very large databases. The main claims are that (i) the basic learning and deduction tasks are provably tractable and (ii) tractable learning offers viable approaches to a range of issues that have been previously identified as problematic for artificial intelligence systems that are programmed. Among the issues that learning offers to resolve are robustness to inconsistencies, robustness to incomplete information and resolving among alternatives. Attribute-efficient learning algorithms, which allow learning from few examples in large dimensional systems, are fundamental to the approach. Underpinning the overall architecture is a new principled approach to manipulating relations in learning systems. This approach, of independently quantified arguments, allows propositional learning algorithms to be applied systematically to learning relational concepts in polynomial time and in modular fashion.

...read moreread less

Machine learning and natural language processing

[...]

Lluís Màrquez Villodre

1 Jul 2000

TL;DR: Four algorithms for supervised learning, which belong to different families, are compared in a benchmark corpus for the WSD task and both qualitative and quantitative conclusions are drawn.

...read moreread less

Abstract: In this report, some collaborative work between the fields of Machine Learning (ML) and Natural Language Processing (NLP) is presented. The document is structured in two parts. The first part includes a superficial but comprehensive survey covering the state-of-the-art of machine learning techniques applied to natural language learning tasks. In the second part, a particular problem, namely Word Sense Disambiguation (WSD), is studied in more detail. In doing so, four algorithms for supervised learning, which belong to different families, are compared in a benchmark corpus for the WSD task. Both qualitative and quantitative conclusions are drawn.

...read moreread less

Proceedings Article•10.3115/990820.990848•

Explaining away ambiguity: learning verb selectional preference with Bayesian networks

[...]

Massimiliano Ciaramita¹, Mark Johnson¹•Institutions (1)

Brown University¹

31 Jul 2000

TL;DR: On a word sense disambiguation test the model performed better than other state of the art systems for unsupervised learning of selectional preferences and methods for implementing "explaining away" in other graphical frameworks are discussed.

...read moreread less

Abstract: This paper presents a Bayesian model for unsupervised learning of verb selectional preferences. For each verb the model creates a Bayesian network whose architecture is determined by the lexical hicrarchy of Wordnet and whose parameters are estimated from a list of verb-object pairs found from a corpus. "Explaining away", a well-known property of Bayesian networks, helps the model deal in a natural fashion with word sense ambiguity in the training data. On a word sense disambiguation test our model performed better than other state of the art systems for unsupervised learning of selectional preferences. Computational complexity problems, ways of improving this approach and methods for implementing "explaining away" in other graphical frameworks are discussed.

...read moreread less

...

Expand