Top 385 papers published in the topic of Unsupervised learning in 2002

Showing papers on "Unsupervised learning published in 2002"

Book•

Least Squares Support Vector Machines

[...]

Johan A. K. Suykens¹, Tony Van Gestel, Jos De Brabanter, Bart De Moor, Joos Vandewalle - Show less +1 more•Institutions (1)

Katholieke Universiteit Leuven¹

12 Nov 2002

TL;DR: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS- sVM for Unsupervised Learning LS- SVM for Recurrent Networks and Control.

...read moreread less

Abstract: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.

...read moreread less

3,626 citations

Journal Article•10.1109/34.990138•

Unsupervised learning of finite mixture models

[...]

Mário A. T. Figueiredo, Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Mar 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The novelty of the approach is that it does not use a model selection criterion to choose one among a set of preestimated candidate models; instead, it seamlessly integrate estimation and model selection in a single algorithm.

...read moreread less

Abstract: This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective "unsupervised" is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

...read moreread less

2,464 citations

Journal Article•10.1162/089976602317318938•

Slow feature analysis: unsupervised learning of invariances

[...]

Laurenz Wiskott¹, Terrence J. Sejnowski¹•Institutions (1)

Salk Institute for Biological Studies¹

01 Apr 2002-Neural Computation

TL;DR: Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal that is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decor-related features, which are ordered by their degree of invariance.

...read moreread less

Abstract: Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.

...read moreread less

1,649 citations

Journal Article•10.1109/TNN.2002.1000150•

Mercer kernel-based clustering in feature space

[...]

Mark Girolami¹•Institutions (1)

Helsinki University of Technology¹

01 May 2002-IEEE Transactions on Neural Networks

TL;DR: It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data.

...read moreread less

Abstract: The article presents a method for both the unsupervised partitioning of a sample of data and the estimation of the possible number of inherent clusters which generate the data. This work exploits the notion that performing a nonlinear data transformation into some high dimensional feature space increases the probability of the linear separability of the patterns within the transformed space and therefore simplifies the associated data structure. It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data.

...read moreread less

1,024 citations

Posted Content•

Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment

[...]

Zhenyue Zhang, Hongyuan Zha

07 Dec 2002-arXiv: Learning

TL;DR: A new algorithm for manifold learning and nonlinear dimension reduction is presented based on a set of unorganized data points sampled with noise from the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point.

...read moreread less

Abstract: Nonlinear manifold learning from unorganized data points is a very challenging unsupervised learning and data visualization problem with a great variety of applications. In this paper we present a new algorithm for manifold learning and nonlinear dimension reduction. Based on a set of unorganized data points sampled with noise from the manifold, we represent the local geometry of the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point. Those tangent spaces are aligned to give the internal global coordinates of the data points with respect to the underlying manifold by way of a partial eigendecomposition of the neighborhood connection matrix. We present a careful error analysis of our algorithm and show that the reconstruction errors are of second-order accuracy. We illustrate our algorithm using curves and surfaces both in 2D/3D and higher dimensional Euclidean spaces, and 64-by-64 pixel face images with various pose and lighting conditions. We also address several theoretical and algorithmic issues for further research and improvements.

...read moreread less

669 citations

Journal Article•10.1145/601858.601867•

Foundations of statistical natural language processing

[...]

Gerhard Weikum

1 Sep 2002

TL;DR: The book is about natural language processing (NLP), covering the entire spectrum from parsing and disambiguation, sentence tagging, and machine translation, all the way to text analysis, information extraction, and document retrieval, and there is plenty of material that fits right in the mainstream of an IR course.

...read moreread less

Abstract: It is a pleasure to write this review of an excellent textbook. Pedagogically this is a real gem, and it is a great source of teaching material for a variety of courses. I discovered the book when I was looking for textbook material that I could use in my class on information retrieval (IR), which is an advanced undergraduate course at my university. Being myself a "core database person", I committed myself to teaching the IR class with the intention to learn a new subject and the expectation that there should be plenty of good textbooks to choose from. I easily found a few decent books, but none of them seemed adequate as the main literature for my class (which is also a matter of personal taste, however). I ended up taking material from different sources, textbooks as well as conference and journal papers (the latter especially for Web search), and the one source to which I referred most and which both my students and myself appreciated most was the book by Manning and Schiitze, The book is about natural language processing (NLP), covering the entire spectrum from parsing and disambiguation, sentence tagging, and machine translation, all the way to text analysis, information extraction, and document retrieval. So the book's main topic is not really IR, but among the 15 chapters of the 680-pages book there is plenty of material that fits right in the mainstream of an IR course. In addition, I discovered a substantial fraction of the core NLP material to be extremely insightful and relevant for text mining and other aspects of IR in a broader sense. In particular, I realized the value of the foundational topics that relate to information theory and statistical learning. Of course, I had been aware of the connection between IR and NLP, but there is a big difference between abstract knowledge and really seeing the relationships in detail when preparing a class. The book was a real eye-opener to me in this regard. The book is organized in four parts: preliminaries, words, grammar, and applications and techniques. The first part, preliminaries, provides general motivation, linguistic basics, and mathematical foundations from elementary probability and information theory including such widely useful concepts as the Kullback-Leibler divergence. Part two, words, discusses different kinds of word occurrence statistics, the problem of word sense ambiguity and techniques for disambiguation, and a suite of lexical analysis techniques …

...read moreread less

640 citations

Book Chapter•10.1007/978-1-4471-0123-9_3•

The Supervised Learning No-Free-Lunch Theorems

[...]

David H. Wolpert¹•Institutions (1)

Ames Research Center¹

1 Jan 2002

TL;DR: This paper reviews the supervised learning versions of the no-free-lunch theorems in a simplified form, and discusses the significance of those theoresms, and their relation to other aspects of supervised learning.

...read moreread less

Abstract: This paper reviews the supervised learning versions of the no-free-lunch theorems in a simplified form It also discusses the significance of those theorems, and their relation to other aspects of supervised learning

...read moreread less

559 citations

Journal Article•10.1109/TNN.2002.804221•

The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data

[...]

Andreas Rauber¹, Dieter Merkl¹, Michael Dittenbach¹•Institutions (1)

Vienna University of Technology¹

01 Nov 2002-IEEE Transactions on Neural Networks

TL;DR: The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data, and by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated.

...read moreread less

Abstract: The self-organizing map (SOM) is a very popular unsupervised neural-network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related to the static architecture of this model as well as to the limited capabilities for the representation of hierarchical relations of the data. With our novel growing hierarchical SOM (GHSOM) we address both limitations. The GHSOM is an artificial neural-network model with hierarchical architecture composed of independent growing SOMs. The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. Furthermore, by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated. The benefits of this novel neural network are a problem-dependent architecture and the intuitive representation of hierarchical relations in the data. This is especially appealing in explorative data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion.

...read moreread less

550 citations

Journal Article•10.1016/S0893-6080(02)00078-3•

A self-organising network that grows when required

[...]

Stephen Marsland¹, Jonathan Shapiro¹, Ulrich Nehmzow²•Institutions (2)

University of Manchester¹, University of Essex²

01 Oct 2002-Neural Networks

TL;DR: In this paper, the authors proposed a growing neural gas (GNG) algorithm, which can add nodes whenever the network in its current state does not sufficiently match the input, but stops growing once the network has matched the data.

...read moreread less

446 citations

Proceedings Article•

Active + Semi-supervised Learning = Robust Multi-View Learning

[...]

Ion Muslea¹, Steven Minton, Craig A. Knoblock¹•Institutions (1)

Information Sciences Institute¹

8 Jul 2002

TL;DR: A new multi-view algorithm, Co-EMT, which combines semi-supervised and active learning is introduced, which outperforms the other algorithms both on the parameterized problems and on two additional real world domains.

...read moreread less

Abstract: In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated (i.e., every example is identically labeled by the target concepts in each view; and, given the label of any example, its descriptions in each view are independent). As these assumptions are unlikely to hold in practice, it is crucial to understand the behavior of multi-view algorithms on problems with incompatible, correlated views. We address this issue by studying several algorithms on a parameterized family of text classification problems in which we control both view correlation and incompatibility. We first show that existing semi-supervised algorithms are not robust over the whole spectrum of parameterized problems. Then we introduce a new multi-view algorithm, Co-EMT, which combines semi-supervised and active learning. Co-EMT outperforms the other algorithms both on the parameterized problems and on two additional real world domains. Our experiments suggest that Co-EMT’s robustness comes from active learning compensating for the correlation of the views.

...read moreread less

348 citations

Journal Article•10.1109/TPAMI.2002.1033211•

Constructing boosting algorithms from SVMs: an application to one-class classification

[...]

Gunnar Rätsch¹, Sebastian Mika, Bernhard Schölkopf², Klaus-Robert Müller³•Institutions (3)

Australian National University¹, Max Planck Society², University of Potsdam³

01 Sep 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work shows via an equivalence of mathematical programs that a support vector algorithm can be translated into an equivalent boosting-like algorithm and vice versa, and exemplifies this translation procedure for a new algorithm: one-class leveraging, starting from the one- class support vector machine (1-SVM).

...read moreread less

Abstract: We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm: one-class leveraging, starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

...read moreread less

Book Chapter•10.1007/3-540-47979-1_52•

Adjustment Learning and Relevant Component Analysis

[...]

Noam Shental¹, Tomer Hertz¹, Daphna Weinshall¹, Misha Pavel²•Institutions (2)

Hebrew University of Jerusalem¹, Oregon Health & Science University²

28 May 2002

TL;DR: A new learning approach for image retrieval is proposed, which is called adjustment learning, and this scheme uses the information in chunklets to reduce irrelevant variability in the data while amplifying relevant variability.

...read moreread less

Abstract: We propose a new learning approach for image retrieval, which we call adjustment learning, and demonstrate its use for face recognition and color matching. Our approach is motivated by a frequently encountered problem, namely, that variability in the original data representation which is not relevant to the task may interfere with retrieval and make it very difficult. Our key observation is that in real applications of image retrieval, data sometimes comes in small chunks - small subsets of images that come from the same (but unknown) class. This is the case, for example, when a query is presented via a short video clip. We call these groups chunklets, and we call the paradigm which uses chunklets for unsupervised learning adjustment learning. Within this paradigm we propose a linear scheme, which we call Relevant Component Analysis; this scheme uses the information in such chunklets to reduce irrelevant variability in the data while amplifying relevant variability. We provide results using our method on two problems: face recognition (using a database publicly available on the web), and visual surveillance (using our own data). In the latter application chunklets are obtained automatically from the data without the need of supervision.

...read moreread less

Journal Article•10.3758/BF03196342•

Comparing supervised and unsupervised category learning.

[...]

Bradley C. Love¹•Institutions (1)

University of Texas at Austin¹

01 Dec 2002-Psychonomic Bulletin & Review

TL;DR: The approach allows for direct comparisons of unsupervised learning data with the Shepard, Hovland, and Jenkins (1961) seminal studies in supervised classification learning.

...read moreread less

Abstract: Two unsupervised learning modes (incidental and intentional unsupervised learning) and their relation to supervised classification learning are examined. The approach allows for direct comparisons of unsupervised learning data with the Shepard, Hovland, and Jenkins (1961) seminal studies in supervised classification learning. Unlike supervised classification learning, unsupervised learning (especially under incidental conditions) favors linear category structures over compact nonlinear category structures. Unsupervised learning is shown to be multifaceted in that performance varies with task conditions. In comparison with incidental unsupervised learning, intentional unsupervised learning is more rule like, but is no more accurate. The acquisition and application of knowledge is also more laborious under intentional unsupervised learning.

...read moreread less

Journal Article•10.1109/72.991428•

Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks

[...]

Sander M. Bohte, H. La Poutre, Joost N. Kok¹•Institutions (1)

Leiden University¹

01 Mar 2002-IEEE Transactions on Neural Networks

TL;DR: In this paper, a spiking neural network based on spike-time coding and Hebbian learning is proposed for unsupervised clustering on real-world data, and temporal synchrony in a multilayer network can induce hierarchical clustering.

...read moreread less

Abstract: We demonstrate that spiking neural networks encoding information in the timing of single spikes are capable of computing and learning clusters from realistic data. We show how a spiking neural network based on spike-time coding and Hebbian learning can successfully perform unsupervised clustering on real-world data, and we demonstrate how temporal synchrony in a multilayer network can induce hierarchical clustering. We develop a temporal encoding of continuously valued data to obtain adjustable clustering capacity and precision with an efficient use of neurons: input variables are encoded in a population code by neurons with graded and overlapping sensitivity profiles. We also discuss methods for enhancing scale-sensitivity of the network and show how the induced synchronization of neurons within early RBF layers allows for the subsequent detection of complex clusters.

...read moreread less

Proceedings Article•10.5555/777092.777145•

Reinforcement learning of coordination in cooperative multi-agent systems

[...]

Spiros Kapetanakis¹, Daniel Kudenko¹•Institutions (1)

University of York¹

28 Jul 2002

TL;DR: This investigation of reinforcement learning techniques for the learning of coordination in cooperative multi-agent systems focuses on a novel action selection strategy for Q-learning (Watkins 1989), and demonstrates empirically that this extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.

...read moreread less

Abstract: We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multi-agent systems. Specifically, we focus on a novel action selection strategy for Q-learning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible.To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results (Claus & Boutilier 1998) by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.

...read moreread less

Journal Article•10.1613/JAIR.946•

Efficient reinforcement learning using recursive least-squares methods

[...]

Xin Xu¹, Han-gen He¹, Dewen Hu¹•Institutions (1)

National University of Defense Technology¹

01 Jan 2002-Journal of Artificial Intelligence Research

TL;DR: RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed and it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic.

...read moreread less

Abstract: The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(λ) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(λ) can be viewed as the extension of RLS-TD(0) from λ =0 to general 0≤ λ ≤1, so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(λ) are proved for ergodic Markov chains. Compared to the existing LS-TD(λ) algorithm, RLS-TD(λ) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(λ) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(λ) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(λ). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(λ) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.

...read moreread less

Journal Article•10.1109/83.988960•

Unsupervised image classification, segmentation, and enhancement using ICA mixture models

[...]

Te-Won Lee¹, Michael S. Lewicki²•Institutions (2)

University of California, San Diego¹, Carnegie Mellon University²

01 Mar 2002-IEEE Transactions on Image Processing

TL;DR: This paper demonstrates that the unsupervised classification method, derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities, was effective in classifying complex image textures such as natural scenes and text.

...read moreread less

Abstract: An unsupervised classification algorithm is derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities. The algorithm estimates the data density in each class by using parametric nonlinear functions that fit to the non-Gaussian structure of the data. This improves classification accuracy compared with standard Gaussian mixture models. When applied to images, the algorithm can learn efficient codes (basis functions) for images that capture the statistically significant structure intrinsic in the images. We apply this technique to the problem of unsupervised classification, segmentation, and denoising of images. We demonstrate that this method was effective in classifying complex image textures such as natural scenes and text. It was also useful for denoising and filling in missing pixels in images with complex structures. The advantage of this model is that image codes can be learned with increasing numbers of classes thus providing greater flexibility in modeling structure and in finding more image features than in either Gaussian mixture models or standard independent component analysis (ICA) algorithms.

...read moreread less

Journal Article•10.1016/S0167-8655(02)00032-6•

A k-segments algorithm for finding principal curves

[...]

Jakob Verbeek¹, Nikos Vlassis¹, Ben Kröse¹•Institutions (1)

University of Amsterdam¹

01 Jun 2002-Pattern Recognition Letters

TL;DR: In this article, an incremental method to find principal curves is proposed, where line segments are fitted and connected to form polygonal lines, and new segments are inserted until a performance criterion is met.

...read moreread less

Proceedings Article•10.1109/IJCNN.2002.1007592•

Learning from labeled and unlabeled data

[...]

Ravi Kothari¹, Vivek Jain¹•Institutions (1)

Indian Institutes of Technology¹

7 Aug 2002

TL;DR: This paper uses an evolutionary strategy to iteratively adjust the class membership of the patterns in the unlabeled sample so that the class conditional distribution obtained from such a labeling allows a maximum a posteriori classification with minimum classification error on the labeled patterns.

...read moreread less

Abstract: Due to the considerable time and expense required in labeling data, a challenge is to propose learning algorithms that can learn from a small amount of labeled data and a much larger amount of unlabeled data. In this paper, we propose one such algorithm which uses an evolutionary strategy to iteratively adjust the class membership of the patterns in the unlabeled sample. The iterative adjustment is done so that the class conditional distribution obtained from such a labeling allows a maximum a posteriori classification with minimum classification error on the labeled patterns. We detail the algorithm and provide results obtained by the proposed algorithm on 5 different datasets.

...read moreread less

Proceedings Article•

Combining Labeled and Unlabeled Data for MultiClass Text Categorization

[...]

Rayid Ghani

8 Jul 2002

TL;DR: This paper develops a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by first decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classi cation problems.

...read moreread less

Abstract: Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such as EM and Co-Training, are mostly applicable for classi cation tasks with a small number of classes and do not scale up well for large multiclass problems. In this paper, we develop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by rst decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classi cation problems. We show that our method is especially useful for text classi cation tasks involving a large number of categories and outperforms other semi-supervised learning techniques such as EM and Co-Training. In addition to being highly accurate, this method utilizes the hamming distance from ECOC to provide high-precision results. We also present results with algorithms other than co-training in this framework and show that co-training is uniquely suited to work well within ECOC.

...read moreread less

Journal Article•10.3233/IDA-2002-6605•

Evolutionary model selection in unsupervised learning

[...]

Yong Seog Kim¹, W. Nick Street¹, Filippo Menczer¹•Institutions (1)

Utah State University¹

1 Dec 2002

TL;DR: ELSA is used, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space and results in models with better and clearer semantic relevance.

...read moreread less

Abstract: Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, K-means and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Pareto-optimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance.

...read moreread less

Journal Article•10.1145/601858.601866•

Introduction to constraint databases

[...]

Bart Kuijpers

1 Sep 2002

TL;DR: This book presents the constraint database model as a powerful extension of the relational model that allows a user or programmer to work easily with infinite data and shows that the constraint model provides an elegant tool for data modeling and querying in application areas such as geographic information systems, spatiotemporal data management, bioinformatics, genome databases and computer vision.

...read moreread less

Abstract: This book is the first textbook on constraint databases. Its author, together with P. Kanellakis and G. Kuper, introduced constraint databases in 1990 as a powerful generalization of the relational database model. Constraints, such as linear or polynomial equalities and inequalities, are used to finitely represent possibly infinite sets of points. They provide an elegant way to combine classical relational data with, for instance, spatial or temporal data. Since the early 1990s, the topic of constraint databases has received considerable research interest, both theoretical and towards systems development, and has been present at most database conferences during the past decade. It turned out to be a rich area in which a combination of techniques from, e.g., logic, (finite) model theory, algebraic and computational geometry, topology, query languages and symbolic computation are applied. A comprehensive survey of the research results in this field appeared two years ago (Constraint databases, edited by G. Kuper, L. Libkin and J. Paredaens, Springer, 2000). Whereas this survey mainly addresses researchers, the present book aims at making the topic of constraint databases accessible to advanced undergraduate and beginning graduate students and mainly addresses constraint databases from a developer's point of view. This book will certainly contribute to the exposure of constraint databases to a wider audience and hopefully also to its proliferation in a broader database practice. Summary of the book This textbook presents the constraint database model as a powerful extension of the relational model that allows a user or programmer to work easily with infinite data. It covers a wide range of constraint formalisms and shows that the constraint model provides an elegant tool for data modeling and querying in application areas such as geographic information systems (GIS), spatiotemporal data management, bioinformatics, genome databases and computer vision. This book covers a substantial part of constraint-database theory, emphasizes several developer's issues, and presents a number of sample constraint database systems. The author starts by developing the constraint data model from the relational one. Next, he shows how familiar query languages, such as the relational algebra, SQL, and various forms of Datalog carry over to the constraint model. A third broad part of the book focuses on query evaluation and addresses theoretical topics such as quantifierelimination algorithms for several constraint languages and the complexity of query evaluation in these languages. Also more specific data models and query languages are addressed, e.g., for spatiotemporal database applications. Next, the book describes a sample linear constraint database system, a Boolean constraint database system, and a spatiotemporal database system. A last part of the book presents a number of sample applications.

...read moreread less

Proceedings Article•10.1109/IJCNN.2002.1007796•

Reinforcement learning for adaptive routing

[...]

Leonid Peshkin¹, Virginia Savova²•Institutions (2)

Massachusetts Institute of Technology¹, Johns Hopkins University School of Medicine²

7 Aug 2002

TL;DR: An application of a gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication is presented and the performance of this algorithm is compared to other routing methods on a benchmark problem.

...read moreread less

Abstract: Reinforcement learning means learning a policy-a mapping of observations into actions-based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. We present an application of a gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication and compare the performance of this algorithm to other routing methods on a benchmark problem.

...read moreread less

Proceedings Article•

Stability-Based Model Selection

[...]

Tilman Lange¹, Mikio L. Braun¹, Volker Roth¹, Joachim M. Buhmann¹•Institutions (1)

University of Bonn¹

1 Jan 2002

TL;DR: A new model assessment scheme is introduced which is based on a notion of stability, which yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems.

...read moreread less

Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is cross-validation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.

...read moreread less

Posted Content•

Unsupervised Language Acquisition: Theory and Practice

[...]

Alexander Clark

10 Dec 2002-arXiv: Computation and Language

TL;DR: This thesis presents various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models, and examines the interaction between the various components to show how these algorithms can form the basis for a empiricist model of language acquisition.

...read moreread less

Abstract: In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with non-concatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.

...read moreread less

Proceedings Article•10.1145/564376.564397•

The use of unlabeled data to improve supervised learning for text summarization

[...]

Massih-Reza Amini, Patrick Gallinari

11 Aug 2002

TL;DR: This work proposes new semi-supervised algorithms for training classification models for text summarization, and analyzes their performances on two data sets - the Reuters news-wire corpus and the Computation and Language collection of TIPSTER SUMMAC.

...read moreread less

Abstract: With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.

...read moreread less

Proceedings Article•

Pitch accent prediction using ensemble machine learning.

[...]

Xuejing Sun¹•Institutions (1)

Northwestern University¹

1 Jan 2002

TL;DR: Improved performance was achieved by ensemble learning in all experiments and the best result was obtained in the third task, in which the overall correct rate increases from 84.26% to 87.17%.

...read moreread less

Abstract: In this study, we applied ensemble machine learning to predict pitch accents. With decision tree as the baseline algorithm, two popular ensemble learning methods, bagging and boosting, were evaluated across different experiment conditions: using acoustic features only, using text-based features only; using both acoustic and text-based features. F0 related acoustic features are derived from underlying pitch targets. Models of four ToBI pitch accent types (High, Down-stepped high, Low, and Unaccented) are built at the syllable level. Results showed that in all experiments improved performance was achieved by ensemble learning. The best result was obtained in the third task, in which the overall correct rate increases from 84.26% to 87.17%.

...read moreread less

Journal Article•10.1109/TNN.2002.1021888•

Unsupervised speaker recognition based on competition between self-organizing maps

[...]

Itshak Lapidot, Hugo Guterman¹, Arnon D. Cohen¹•Institutions (1)

Ben-Gurion University of the Negev¹

01 Jul 2002-IEEE Transactions on Neural Networks

TL;DR: This method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given, is presented.

...read moreread less

Abstract: We present a method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given. Each speaker was modeled by a self-organizing map (SOM). The SOMs were randomly initiated. An iterative algorithm allows the data move from one model to another and adjust the SOMs. The restriction that the data can move only in small groups but not by moving each and every feature vector separately force the SOMs to adjust to speakers (instead of phonemes or other vocal events). This method was applied to high-quality conversations with two to five participants and to two-speaker telephone-quality conversations. The results for two (both high- and telephone-quality) and three speakers were over 80% correct segmentation. The problem becomes even harder when the number of participants is also unknown. Based on the iterative clustering algorithm a validity criterion was also developed to estimate the number of speakers. In 16 out of 17 conversations of high-quality conversations between two and three participants, the estimation of the number of the participants was correct. In telephone-quality the results were poorer.

...read moreread less

Patent•

Method for image region classification using unsupervised and supervised learning

[...]

Alexander C. Loui¹, Sanjiv Kumar¹•Institutions (1)

Eastman Kodak Company¹

7 Feb 2002

TL;DR: In this article, a method for classification of image regions by probabilistic merging of a class probability map and a cluster probability map includes the steps of extracting one or more features from an input image composed of image pixels.

...read moreread less

Abstract: A method for classification of image regions by probabilistic merging of a class probability map and a cluster probability map includes the steps of a) extracting one or more features from an input image composed of image pixels; b) performing unsupervised learning based on the extracted features to obtain a cluster probability map of the image pixels; c) performing supervised learning based on the extracted features to obtain a class probability map of the image pixels; and d) combining the cluster probability map from unsupervised learning and the class probability map from supervised learning to generate a modified class probability map to determine the semantic class of the image regions. In one embodiment the extracted features include color and textual features.

...read moreread less

Book Chapter•10.1007/3-540-46146-9_53•

Efficient Data Mining Based on Formal Concept Analysis

[...]

Gerd Stumme¹•Institutions (1)

Karlsruhe Institute of Technology¹

2 Sep 2002

TL;DR: In this article, the concept of iceberg concept lattices is introduced and used in knowledge discovery in databases. But their use is limited to very large databases and they serve as a condensed representation of frequent patterns as known from association rule mining.

...read moreread less

Abstract: Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining.In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.

...read moreread less

...

Expand