Scispace (Formerly Typeset)
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 2002
  1. Home
  2. Topics
  3. Unsupervised learning
  4. 2002
Showing papers on "Unsupervised learning published in 2002"
Book•
Least Squares Support Vector Machines

[...]

Johan A. K. Suykens1, Tony Van Gestel, Jos De Brabanter, Bart De Moor, Joos Vandewalle •
Katholieke Universiteit Leuven1
12 Nov 2002
TL;DR: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS- sVM for Unsupervised Learning LS- SVM for Recurrent Networks and Control.
Abstract: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.

3,626 citations

Journal Article•10.1109/34.990138•
Unsupervised learning of finite mixture models

[...]

Mário A. T. Figueiredo, Anil K. Jain1•
Michigan State University1
01 Mar 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: The novelty of the approach is that it does not use a model selection criterion to choose one among a set of preestimated candidate models; instead, it seamlessly integrate estimation and model selection in a single algorithm.
Abstract: This paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective "unsupervised" is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectation-maximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach.

2,464 citations

Journal Article•10.1162/089976602317318938•
Slow feature analysis: unsupervised learning of invariances

[...]

Laurenz Wiskott1, Terrence J. Sejnowski1•
Salk Institute for Biological Studies1
01 Apr 2002-Neural Computation
TL;DR: Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal that is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decor-related features, which are ordered by their degree of invariance.
Abstract: Invariant features of temporally varying signals are useful for analysis and classification. Slow feature analysis (SFA) is a new method for learning invariant or slowly varying features from a vectorial input signal. It is based on a nonlinear expansion of the input signal and application of principal component analysis to this expanded signal and its time derivative. It is guaranteed to find the optimal solution within a family of functions directly and can learn to extract a large number of decorrelated features, which are ordered by their degree of invariance. SFA can be applied hierarchically to process high-dimensional input signals and extract complex features. SFA is applied first to complex cell tuning properties based on simple cell output, including disparity and motion. Then more complicated input-output functions are learned by repeated application of SFA. Finally, a hierarchical network of SFA modules is presented as a simple model of the visual system. The same unstructured network can learn translation, size, rotation, contrast, or, to a lesser degree, illumination invariance for one-dimensional objects, depending on only the training stimulus. Surprisingly, only a few training objects suffice to achieve good generalization to new objects. The generated representation is suitable for object recognition. Performance degrades if the network is trained to learn multiple invariances simultaneously.

1,649 citations

Journal Article•10.1109/TNN.2002.1000150•
Mercer kernel-based clustering in feature space

[...]

Mark Girolami1•
Helsinki University of Technology1
01 May 2002-IEEE Transactions on Neural Networks
TL;DR: It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data.
Abstract: The article presents a method for both the unsupervised partitioning of a sample of data and the estimation of the possible number of inherent clusters which generate the data. This work exploits the notion that performing a nonlinear data transformation into some high dimensional feature space increases the probability of the linear separability of the patterns within the transformed space and therefore simplifies the associated data structure. It is shown that the eigenvectors of a kernel matrix which defines the implicit mapping provides a means to estimate the number of clusters inherent within the data and a computationally simple iterative procedure is presented for the subsequent feature space partitioning of the data.

1,024 citations

Posted Content•
Principal Manifolds and Nonlinear Dimension Reduction via Local Tangent Space Alignment

[...]

Zhenyue Zhang, Hongyuan Zha
07 Dec 2002-arXiv: Learning
TL;DR: A new algorithm for manifold learning and nonlinear dimension reduction is presented based on a set of unorganized data points sampled with noise from the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point.
Abstract: Nonlinear manifold learning from unorganized data points is a very challenging unsupervised learning and data visualization problem with a great variety of applications. In this paper we present a new algorithm for manifold learning and nonlinear dimension reduction. Based on a set of unorganized data points sampled with noise from the manifold, we represent the local geometry of the manifold using tangent spaces learned by fitting an affine subspace in a neighborhood of each data point. Those tangent spaces are aligned to give the internal global coordinates of the data points with respect to the underlying manifold by way of a partial eigendecomposition of the neighborhood connection matrix. We present a careful error analysis of our algorithm and show that the reconstruction errors are of second-order accuracy. We illustrate our algorithm using curves and surfaces both in 2D/3D and higher dimensional Euclidean spaces, and 64-by-64 pixel face images with various pose and lighting conditions. We also address several theoretical and algorithmic issues for further research and improvements.

669 citations

Journal Article•10.1145/601858.601867•
Foundations of statistical natural language processing

[...]

Gerhard Weikum
1 Sep 2002
TL;DR: The book is about natural language processing (NLP), covering the entire spectrum from parsing and disambiguation, sentence tagging, and machine translation, all the way to text analysis, information extraction, and document retrieval, and there is plenty of material that fits right in the mainstream of an IR course.
Abstract: It is a pleasure to write this review of an excellent textbook. Pedagogically this is a real gem, and it is a great source of teaching material for a variety of courses. I discovered the book when I was looking for textbook material that I could use in my class on information retrieval (IR), which is an advanced undergraduate course at my university. Being myself a "core database person", I committed myself to teaching the IR class with the intention to learn a new subject and the expectation that there should be plenty of good textbooks to choose from. I easily found a few decent books, but none of them seemed adequate as the main literature for my class (which is also a matter of personal taste, however). I ended up taking material from different sources, textbooks as well as conference and journal papers (the latter especially for Web search), and the one source to which I referred most and which both my students and myself appreciated most was the book by Manning and Schiitze, The book is about natural language processing (NLP), covering the entire spectrum from parsing and disambiguation, sentence tagging, and machine translation, all the way to text analysis, information extraction, and document retrieval. So the book's main topic is not really IR, but among the 15 chapters of the 680-pages book there is plenty of material that fits right in the mainstream of an IR course. In addition, I discovered a substantial fraction of the core NLP material to be extremely insightful and relevant for text mining and other aspects of IR in a broader sense. In particular, I realized the value of the foundational topics that relate to information theory and statistical learning. Of course, I had been aware of the connection between IR and NLP, but there is a big difference between abstract knowledge and really seeing the relationships in detail when preparing a class. The book was a real eye-opener to me in this regard. The book is organized in four parts: preliminaries, words, grammar, and applications and techniques. The first part, preliminaries, provides general motivation, linguistic basics, and mathematical foundations from elementary probability and information theory including such widely useful concepts as the Kullback-Leibler divergence. Part two, words, discusses different kinds of word occurrence statistics, the problem of word sense ambiguity and techniques for disambiguation, and a suite of lexical analysis techniques …

640 citations

Book Chapter•10.1007/978-1-4471-0123-9_3•
The Supervised Learning No-Free-Lunch Theorems

[...]

David H. Wolpert1•
Ames Research Center1
1 Jan 2002
TL;DR: This paper reviews the supervised learning versions of the no-free-lunch theorems in a simplified form, and discusses the significance of those theoresms, and their relation to other aspects of supervised learning.
Abstract: This paper reviews the supervised learning versions of the no-free-lunch theorems in a simplified form It also discusses the significance of those theorems, and their relation to other aspects of supervised learning

559 citations

Journal Article•10.1109/TNN.2002.804221•
The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data

[...]

Andreas Rauber1, Dieter Merkl1, Michael Dittenbach1•
Vienna University of Technology1
01 Nov 2002-IEEE Transactions on Neural Networks
TL;DR: The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data, and by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated.
Abstract: The self-organizing map (SOM) is a very popular unsupervised neural-network model for the analysis of high-dimensional input data as in data mining applications. However, at least two limitations have to be noted, which are related to the static architecture of this model as well as to the limited capabilities for the representation of hierarchical relations of the data. With our novel growing hierarchical SOM (GHSOM) we address both limitations. The GHSOM is an artificial neural-network model with hierarchical architecture composed of independent growing SOMs. The motivation was to provide a model that adapts its architecture during its unsupervised training process according to the particular requirements of the input data. Furthermore, by providing a global orientation of the independently growing maps in the individual layers of the hierarchy, navigation across branches is facilitated. The benefits of this novel neural network are a problem-dependent architecture and the intuitive representation of hierarchical relations in the data. This is especially appealing in explorative data mining applications, allowing the inherent structure of the data to unfold in a highly intuitive fashion.

550 citations

Journal Article•10.1016/S0893-6080(02)00078-3•
A self-organising network that grows when required

[...]

Stephen Marsland1, Jonathan Shapiro1, Ulrich Nehmzow2•
University of Manchester1, University of Essex2
01 Oct 2002-Neural Networks
TL;DR: In this paper, the authors proposed a growing neural gas (GNG) algorithm, which can add nodes whenever the network in its current state does not sufficiently match the input, but stops growing once the network has matched the data.

446 citations

Proceedings Article•
Active + Semi-supervised Learning = Robust Multi-View Learning

[...]

Ion Muslea1, Steven Minton, Craig A. Knoblock1•
Information Sciences Institute1
8 Jul 2002
TL;DR: A new multi-view algorithm, Co-EMT, which combines semi-supervised and active learning is introduced, which outperforms the other algorithms both on the parameterized problems and on two additional real world domains.
Abstract: In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. Semi-supervised, multi-view algorithms, which reduce the amount of labeled data required for learning, rely on the assumptions that the views are compatible and uncorrelated (i.e., every example is identically labeled by the target concepts in each view; and, given the label of any example, its descriptions in each view are independent). As these assumptions are unlikely to hold in practice, it is crucial to understand the behavior of multi-view algorithms on problems with incompatible, correlated views. We address this issue by studying several algorithms on a parameterized family of text classification problems in which we control both view correlation and incompatibility. We first show that existing semi-supervised algorithms are not robust over the whole spectrum of parameterized problems. Then we introduce a new multi-view algorithm, Co-EMT, which combines semi-supervised and active learning. Co-EMT outperforms the other algorithms both on the parameterized problems and on two additional real world domains. Our experiments suggest that Co-EMT’s robustness comes from active learning compensating for the correlation of the views.

348 citations

Journal Article•10.1109/TPAMI.2002.1033211•
Constructing boosting algorithms from SVMs: an application to one-class classification

[...]

Gunnar Rätsch1, Sebastian Mika, Bernhard Schölkopf2, Klaus-Robert Müller3•
Australian National University1, Max Planck Society2, University of Potsdam3
01 Sep 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence
TL;DR: This work shows via an equivalence of mathematical programs that a support vector algorithm can be translated into an equivalent boosting-like algorithm and vice versa, and exemplifies this translation procedure for a new algorithm: one-class leveraging, starting from the one- class support vector machine (1-SVM).
Abstract: We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithm: one-class leveraging, starting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.
Book Chapter•10.1007/3-540-47979-1_52•
Adjustment Learning and Relevant Component Analysis

[...]

Noam Shental1, Tomer Hertz1, Daphna Weinshall1, Misha Pavel2•
Hebrew University of Jerusalem1, Oregon Health & Science University2
28 May 2002
TL;DR: A new learning approach for image retrieval is proposed, which is called adjustment learning, and this scheme uses the information in chunklets to reduce irrelevant variability in the data while amplifying relevant variability.
Abstract: We propose a new learning approach for image retrieval, which we call adjustment learning, and demonstrate its use for face recognition and color matching. Our approach is motivated by a frequently encountered problem, namely, that variability in the original data representation which is not relevant to the task may interfere with retrieval and make it very difficult. Our key observation is that in real applications of image retrieval, data sometimes comes in small chunks - small subsets of images that come from the same (but unknown) class. This is the case, for example, when a query is presented via a short video clip. We call these groups chunklets, and we call the paradigm which uses chunklets for unsupervised learning adjustment learning. Within this paradigm we propose a linear scheme, which we call Relevant Component Analysis; this scheme uses the information in such chunklets to reduce irrelevant variability in the data while amplifying relevant variability. We provide results using our method on two problems: face recognition (using a database publicly available on the web), and visual surveillance (using our own data). In the latter application chunklets are obtained automatically from the data without the need of supervision.
Journal Article•10.3758/BF03196342•
Comparing supervised and unsupervised category learning.

[...]

Bradley C. Love1•
University of Texas at Austin1
01 Dec 2002-Psychonomic Bulletin & Review
TL;DR: The approach allows for direct comparisons of unsupervised learning data with the Shepard, Hovland, and Jenkins (1961) seminal studies in supervised classification learning.
Abstract: Two unsupervised learning modes (incidental and intentional unsupervised learning) and their relation to supervised classification learning are examined. The approach allows for direct comparisons of unsupervised learning data with the Shepard, Hovland, and Jenkins (1961) seminal studies in supervised classification learning. Unlike supervised classification learning, unsupervised learning (especially under incidental conditions) favors linear category structures over compact nonlinear category structures. Unsupervised learning is shown to be multifaceted in that performance varies with task conditions. In comparison with incidental unsupervised learning, intentional unsupervised learning is more rule like, but is no more accurate. The acquisition and application of knowledge is also more laborious under intentional unsupervised learning.
Journal Article•10.1109/72.991428•
Unsupervised clustering with spiking neurons by sparse temporal coding and multilayer RBF networks

[...]

Sander M. Bohte, H. La Poutre, Joost N. Kok1•
Leiden University1
01 Mar 2002-IEEE Transactions on Neural Networks
TL;DR: In this paper, a spiking neural network based on spike-time coding and Hebbian learning is proposed for unsupervised clustering on real-world data, and temporal synchrony in a multilayer network can induce hierarchical clustering.
Abstract: We demonstrate that spiking neural networks encoding information in the timing of single spikes are capable of computing and learning clusters from realistic data. We show how a spiking neural network based on spike-time coding and Hebbian learning can successfully perform unsupervised clustering on real-world data, and we demonstrate how temporal synchrony in a multilayer network can induce hierarchical clustering. We develop a temporal encoding of continuously valued data to obtain adjustable clustering capacity and precision with an efficient use of neurons: input variables are encoded in a population code by neurons with graded and overlapping sensitivity profiles. We also discuss methods for enhancing scale-sensitivity of the network and show how the induced synchronization of neurons within early RBF layers allows for the subsequent detection of complex clusters.
Proceedings Article•10.5555/777092.777145•
Reinforcement learning of coordination in cooperative multi-agent systems

[...]

Spiros Kapetanakis1, Daniel Kudenko1•
University of York1
28 Jul 2002
TL;DR: This investigation of reinforcement learning techniques for the learning of coordination in cooperative multi-agent systems focuses on a novel action selection strategy for Q-learning (Watkins 1989), and demonstrates empirically that this extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.
Abstract: We report on an investigation of reinforcement learning techniques for the learning of coordination in cooperative multi-agent systems. Specifically, we focus on a novel action selection strategy for Q-learning (Watkins 1989). The new technique is applicable to scenarios where mutual observation of actions is not possible.To date, reinforcement learning approaches for such independent agents did not guarantee convergence to the optimal joint action in scenarios with high miscoordination costs. We improve on previous results (Claus & Boutilier 1998) by demonstrating empirically that our extension causes the agents to converge almost always to the optimal joint action even in these difficult cases.
Journal Article•10.1613/JAIR.946•
Efficient reinforcement learning using recursive least-squares methods

[...]

Xin Xu1, Han-gen He1, Dewen Hu1•
National University of Defense Technology1
01 Jan 2002-Journal of Artificial Intelligence Research
TL;DR: RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed and it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic.
Abstract: The recursive least-squares (RLS) algorithm is one of the most well-known algorithms used in adaptive filtering, system identification and adaptive control. Its popularity is mainly due to its fast convergence speed, which is considered to be optimal in practice. In this paper, RLS methods are used to solve reinforcement learning problems, where two new reinforcement learning algorithms using linear value function approximators are proposed and analyzed. The two algorithms are called RLS-TD(λ) and Fast-AHC (Fast Adaptive Heuristic Critic), respectively. RLS-TD(λ) can be viewed as the extension of RLS-TD(0) from λ =0 to general 0≤ λ ≤1, so it is a multi-step temporal-difference (TD) learning algorithm using RLS methods. The convergence with probability one and the limit of convergence of RLS-TD(λ) are proved for ergodic Markov chains. Compared to the existing LS-TD(λ) algorithm, RLS-TD(λ) has advantages in computation and is more suitable for online learning. The effectiveness of RLS-TD(λ) is analyzed and verified by learning prediction experiments of Markov chains with a wide range of parameter settings. The Fast-AHC algorithm is derived by applying the proposed RLS-TD(λ) algorithm in the critic network of the adaptive heuristic critic method. Unlike conventional AHC algorithm, Fast-AHC makes use of RLS methods to improve the learning-prediction efficiency in the critic. Learning control experiments of the cart-pole balancing and the acrobot swing-up problems are conducted to compare the data efficiency of Fast-AHC with conventional AHC. From the experimental results, it is shown that the data efficiency of learning control can also be improved by using RLS methods in the learning-prediction process of the critic. The performance of Fast-AHC is also compared with that of the AHC method using LS-TD(λ). Furthermore, it is demonstrated in the experiments that different initial values of the variance matrix in RLS-TD(λ) are required to get better performance not only in learning prediction but also in learning control. The experimental results are analyzed based on the existing theoretical work on the transient phase of forgetting factor RLS methods.
Journal Article•10.1109/83.988960•
Unsupervised image classification, segmentation, and enhancement using ICA mixture models

[...]

Te-Won Lee1, Michael S. Lewicki2•
University of California, San Diego1, Carnegie Mellon University2
01 Mar 2002-IEEE Transactions on Image Processing
TL;DR: This paper demonstrates that the unsupervised classification method, derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities, was effective in classifying complex image textures such as natural scenes and text.
Abstract: An unsupervised classification algorithm is derived by modeling observed data as a mixture of several mutually exclusive classes that are each described by linear combinations of independent, non-Gaussian densities. The algorithm estimates the data density in each class by using parametric nonlinear functions that fit to the non-Gaussian structure of the data. This improves classification accuracy compared with standard Gaussian mixture models. When applied to images, the algorithm can learn efficient codes (basis functions) for images that capture the statistically significant structure intrinsic in the images. We apply this technique to the problem of unsupervised classification, segmentation, and denoising of images. We demonstrate that this method was effective in classifying complex image textures such as natural scenes and text. It was also useful for denoising and filling in missing pixels in images with complex structures. The advantage of this model is that image codes can be learned with increasing numbers of classes thus providing greater flexibility in modeling structure and in finding more image features than in either Gaussian mixture models or standard independent component analysis (ICA) algorithms.
Journal Article•10.1016/S0167-8655(02)00032-6•
A k-segments algorithm for finding principal curves

[...]

Jakob Verbeek1, Nikos Vlassis1, Ben Kröse1•
University of Amsterdam1
01 Jun 2002-Pattern Recognition Letters
TL;DR: In this article, an incremental method to find principal curves is proposed, where line segments are fitted and connected to form polygonal lines, and new segments are inserted until a performance criterion is met.
Proceedings Article•10.1109/IJCNN.2002.1007592•
Learning from labeled and unlabeled data

[...]

Ravi Kothari1, Vivek Jain1•
Indian Institutes of Technology1
7 Aug 2002
TL;DR: This paper uses an evolutionary strategy to iteratively adjust the class membership of the patterns in the unlabeled sample so that the class conditional distribution obtained from such a labeling allows a maximum a posteriori classification with minimum classification error on the labeled patterns.
Abstract: Due to the considerable time and expense required in labeling data, a challenge is to propose learning algorithms that can learn from a small amount of labeled data and a much larger amount of unlabeled data. In this paper, we propose one such algorithm which uses an evolutionary strategy to iteratively adjust the class membership of the patterns in the unlabeled sample. The iterative adjustment is done so that the class conditional distribution obtained from such a labeling allows a maximum a posteriori classification with minimum classification error on the labeled patterns. We detail the algorithm and provide results obtained by the proposed algorithm on 5 different datasets.
Proceedings Article•
Combining Labeled and Unlabeled Data for MultiClass Text Categorization

[...]

Rayid Ghani
8 Jul 2002
TL;DR: This paper develops a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by first decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classi cation problems.
Abstract: Supervised learning techniques for text classi cation often require a large number of labeled examples to learn accurately. One way to reduce the amount of labeled data required is to develop algorithms that can learn e ectively from a small number of labeled examples augmented with a large number of unlabeled examples. Current text learning techniques for combining labeled and unlabeled, such as EM and Co-Training, are mostly applicable for classi cation tasks with a small number of classes and do not scale up well for large multiclass problems. In this paper, we develop a framework to incorporate unlabeled data in the Error-Correcting Output Coding (ECOC) setup by rst decomposing multiclass problems into multiple binary problems and then using Co-Training to learn the individual binary classi cation problems. We show that our method is especially useful for text classi cation tasks involving a large number of categories and outperforms other semi-supervised learning techniques such as EM and Co-Training. In addition to being highly accurate, this method utilizes the hamming distance from ECOC to provide high-precision results. We also present results with algorithms other than co-training in this framework and show that co-training is uniquely suited to work well within ECOC.
Journal Article•10.3233/IDA-2002-6605•
Evolutionary model selection in unsupervised learning

[...]

Yong Seog Kim1, W. Nick Street1, Filippo Menczer1•
Utah State University1
1 Dec 2002
TL;DR: ELSA is used, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space and results in models with better and clearer semantic relevance.
Abstract: Feature subset selection is important not only for the insight gained from determining relevant modeling variables but also for the improved understandability, scalability, and possibly, accuracy of the resulting models. Feature selection has traditionally been studied in supervised learning situations, with some estimate of accuracy used to evaluate candidate subsets. However, we often cannot apply supervised learning for lack of a training signal. For these cases, we propose a new feature selection approach based on clustering. A number of heuristic criteria can be used to estimate the quality of clusters built from a given feature subset. Rather than combining such criteria, we use ELSA, an evolutionary local selection algorithm that maintains a diverse population of solutions that approximate the Pareto front in a multi-dimensional objective space. Each evolved solution represents a feature subset and a number of clusters; two representative clustering algorithms, K-means and EM, are applied to form the given number of clusters based on the selected features. Experimental results on both real and synthetic data show that the method can consistently find approximate Pareto-optimal solutions through which we can identify the significant features and an appropriate number of clusters. This results in models with better and clearer semantic relevance.
Journal Article•10.1145/601858.601866•
Introduction to constraint databases

[...]

Bart Kuijpers
1 Sep 2002
TL;DR: This book presents the constraint database model as a powerful extension of the relational model that allows a user or programmer to work easily with infinite data and shows that the constraint model provides an elegant tool for data modeling and querying in application areas such as geographic information systems, spatiotemporal data management, bioinformatics, genome databases and computer vision.
Abstract: This book is the first textbook on constraint databases. Its author, together with P. Kanellakis and G. Kuper, introduced constraint databases in 1990 as a powerful generalization of the relational database model. Constraints, such as linear or polynomial equalities and inequalities, are used to finitely represent possibly infinite sets of points. They provide an elegant way to combine classical relational data with, for instance, spatial or temporal data. Since the early 1990s, the topic of constraint databases has received considerable research interest, both theoretical and towards systems development, and has been present at most database conferences during the past decade. It turned out to be a rich area in which a combination of techniques from, e.g., logic, (finite) model theory, algebraic and computational geometry, topology, query languages and symbolic computation are applied. A comprehensive survey of the research results in this field appeared two years ago (Constraint databases, edited by G. Kuper, L. Libkin and J. Paredaens, Springer, 2000). Whereas this survey mainly addresses researchers, the present book aims at making the topic of constraint databases accessible to advanced undergraduate and beginning graduate students and mainly addresses constraint databases from a developer's point of view. This book will certainly contribute to the exposure of constraint databases to a wider audience and hopefully also to its proliferation in a broader database practice. Summary of the book This textbook presents the constraint database model as a powerful extension of the relational model that allows a user or programmer to work easily with infinite data. It covers a wide range of constraint formalisms and shows that the constraint model provides an elegant tool for data modeling and querying in application areas such as geographic information systems (GIS), spatiotemporal data management, bioinformatics, genome databases and computer vision. This book covers a substantial part of constraint-database theory, emphasizes several developer's issues, and presents a number of sample constraint database systems. The author starts by developing the constraint data model from the relational one. Next, he shows how familiar query languages, such as the relational algebra, SQL, and various forms of Datalog carry over to the constraint model. A third broad part of the book focuses on query evaluation and addresses theoretical topics such as quantifierelimination algorithms for several constraint languages and the complexity of query evaluation in these languages. Also more specific data models and query languages are addressed, e.g., for spatiotemporal database applications. Next, the book describes a sample linear constraint database system, a Boolean constraint database system, and a spatiotemporal database system. A last part of the book presents a number of sample applications.
Proceedings Article•10.1109/IJCNN.2002.1007796•
Reinforcement learning for adaptive routing

[...]

Leonid Peshkin1, Virginia Savova2•
Massachusetts Institute of Technology1, Johns Hopkins University School of Medicine2
7 Aug 2002
TL;DR: An application of a gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication is presented and the performance of this algorithm is compared to other routing methods on a benchmark problem.
Abstract: Reinforcement learning means learning a policy-a mapping of observations into actions-based on feedback from the environment. The learning can be viewed as browsing a set of policies while evaluating them by trial through interaction with the environment. We present an application of a gradient ascent algorithm for reinforcement learning to a complex domain of packet routing in network communication and compare the performance of this algorithm to other routing methods on a benchmark problem.
Proceedings Article•
Stability-Based Model Selection

[...]

Tilman Lange1, Mikio L. Braun1, Volker Roth1, Joachim M. Buhmann1•
University of Bonn1
1 Jan 2002
TL;DR: A new model assessment scheme is introduced which is based on a notion of stability, which yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems.
Abstract: Model selection is linked to model assessment, which is the problem of comparing different models, or model parameters, for a specific learning task. For supervised learning, the standard practical technique is cross-validation, which is not applicable for semi-supervised and unsupervised settings. In this paper, a new model assessment scheme is introduced which is based on a notion of stability. The stability measure yields an upper bound to cross-validation in the supervised case, but extends to semi-supervised and unsupervised problems. In the experimental part, the performance of the stability measure is studied for model order selection in comparison to standard techniques in this area.
Posted Content•
Unsupervised Language Acquisition: Theory and Practice

[...]

Alexander Clark
10 Dec 2002-arXiv: Computation and Language
TL;DR: This thesis presents various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models, and examines the interaction between the various components to show how these algorithms can form the basis for a empiricist model of language acquisition.
Abstract: In this thesis I present various algorithms for the unsupervised machine learning of aspects of natural languages using a variety of statistical models. The scientific object of the work is to examine the validity of the so-called Argument from the Poverty of the Stimulus advanced in favour of the proposition that humans have language-specific innate knowledge. I start by examining an a priori argument based on Gold's theorem, that purports to prove that natural languages cannot be learned, and some formal issues related to the choice of statistical grammars rather than symbolic grammars. I present three novel algorithms for learning various parts of natural languages: first, an algorithm for the induction of syntactic categories from unlabelled text using distributional information, that can deal with ambiguous and rare words; secondly, a set of algorithms for learning morphological processes in a variety of languages, including languages such as Arabic with non-concatenative morphology; thirdly an algorithm for the unsupervised induction of a context-free grammar from tagged text. I carefully examine the interaction between the various components, and show how these algorithms can form the basis for a empiricist model of language acquisition. I therefore conclude that the Argument from the Poverty of the Stimulus is unsupported by the evidence.
Proceedings Article•10.1145/564376.564397•
The use of unlabeled data to improve supervised learning for text summarization

[...]

Massih-Reza Amini, Patrick Gallinari
11 Aug 2002
TL;DR: This work proposes new semi-supervised algorithms for training classification models for text summarization, and analyzes their performances on two data sets - the Reuters news-wire corpus and the Computation and Language collection of TIPSTER SUMMAC.
Abstract: With the huge amount of information available electronically, there is an increasing demand for automatic text summarization systems. The use of machine learning techniques for this task allows one to adapt summaries to the user needs and to the corpus characteristics. These desirable properties have motivated an increasing amount of work in this field over the last few years. Most approaches attempt to generate summaries by extracting sentence segments and adopt the supervised learning paradigm which requires to label documents at the text span level. This is a costly process, which puts strong limitations on the applicability of these methods. We investigate here the use of semi-supervised algorithms for summarization. These techniques make use of few labeled data together with a larger amount of unlabeled data. We propose new semi-supervised algorithms for training classification models for text summarization. We analyze their performances on two data sets - the Reuters news-wire corpus and the Computation and Language (cmp_lg) collection of TIPSTER SUMMAC. We perform comparisons with a baseline - non learning - system, and a reference trainable summarizer system.
Proceedings Article•
Pitch accent prediction using ensemble machine learning.

[...]

Xuejing Sun1•
Northwestern University1
1 Jan 2002
TL;DR: Improved performance was achieved by ensemble learning in all experiments and the best result was obtained in the third task, in which the overall correct rate increases from 84.26% to 87.17%.
Abstract: In this study, we applied ensemble machine learning to predict pitch accents. With decision tree as the baseline algorithm, two popular ensemble learning methods, bagging and boosting, were evaluated across different experiment conditions: using acoustic features only, using text-based features only; using both acoustic and text-based features. F0 related acoustic features are derived from underlying pitch targets. Models of four ToBI pitch accent types (High, Down-stepped high, Low, and Unaccented) are built at the syllable level. Results showed that in all experiments improved performance was achieved by ensemble learning. The best result was obtained in the third task, in which the overall correct rate increases from 84.26% to 87.17%.
Journal Article•10.1109/TNN.2002.1021888•
Unsupervised speaker recognition based on competition between self-organizing maps

[...]

Itshak Lapidot, Hugo Guterman1, Arnon D. Cohen1•
Ben-Gurion University of the Negev1
01 Jul 2002-IEEE Transactions on Neural Networks
TL;DR: This method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given, is presented.
Abstract: We present a method for clustering the speakers from unlabeled and unsegmented conversation (with known number of speakers), when no a priori knowledge about the identity of the participants is given. Each speaker was modeled by a self-organizing map (SOM). The SOMs were randomly initiated. An iterative algorithm allows the data move from one model to another and adjust the SOMs. The restriction that the data can move only in small groups but not by moving each and every feature vector separately force the SOMs to adjust to speakers (instead of phonemes or other vocal events). This method was applied to high-quality conversations with two to five participants and to two-speaker telephone-quality conversations. The results for two (both high- and telephone-quality) and three speakers were over 80% correct segmentation. The problem becomes even harder when the number of participants is also unknown. Based on the iterative clustering algorithm a validity criterion was also developed to estimate the number of speakers. In 16 out of 17 conversations of high-quality conversations between two and three participants, the estimation of the number of the participants was correct. In telephone-quality the results were poorer.
Patent•
Method for image region classification using unsupervised and supervised learning

[...]

Alexander C. Loui1, Sanjiv Kumar1•
Eastman Kodak Company1
7 Feb 2002
TL;DR: In this article, a method for classification of image regions by probabilistic merging of a class probability map and a cluster probability map includes the steps of extracting one or more features from an input image composed of image pixels.
Abstract: A method for classification of image regions by probabilistic merging of a class probability map and a cluster probability map includes the steps of a) extracting one or more features from an input image composed of image pixels; b) performing unsupervised learning based on the extracted features to obtain a cluster probability map of the image pixels; c) performing supervised learning based on the extracted features to obtain a class probability map of the image pixels; and d) combining the cluster probability map from unsupervised learning and the class probability map from supervised learning to generate a modified class probability map to determine the semantic class of the image regions. In one embodiment the extracted features include color and textual features.
Book Chapter•10.1007/3-540-46146-9_53•
Efficient Data Mining Based on Formal Concept Analysis

[...]

Gerd Stumme1•
Karlsruhe Institute of Technology1
2 Sep 2002
TL;DR: In this article, the concept of iceberg concept lattices is introduced and used in knowledge discovery in databases. But their use is limited to very large databases and they serve as a condensed representation of frequent patterns as known from association rule mining.
Abstract: Formal Concept Analysis is an unsupervised learning technique for conceptual clustering. We introduce the notion of iceberg concept lattices and show their use in Knowledge Discovery in Databases (KDD). Iceberg lattices are designed for analyzing very large databases. In particular they serve as a condensed representation of frequent patterns as known from association rule mining.In order to show the interplay between Formal Concept Analysis and association rule mining, we discuss the algorithm TITANIC. We show that iceberg concept lattices are a starting point for computing condensed sets of association rules without loss of information, and are a visualization method for the resulting rules.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve