Top 701 papers published in the topic of Unsupervised learning in 2008

Showing papers on "Unsupervised learning published in 2008"

Proceedings Article•10.1145/1390156.1390294•

Extracting and composing robust features with denoising autoencoders

[...]

Pascal Vincent¹, Hugo Larochelle¹, Yoshua Bengio¹, Pierre-Antoine Manzagol¹•Institutions (1)

5 Jul 2008

TL;DR: This work introduces and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern.

...read moreread less

Abstract: Previous work has shown that the difficulties in learning deep generative or discriminative models can be overcome by an initial unsupervised learning step that maps inputs to useful intermediate representations. We introduce and motivate a new training principle for unsupervised learning of a representation based on the idea of making the learned representations robust to partial corruption of the input pattern. This approach can be used to train autoencoders, and these denoising autoencoders can be stacked to initialize deep architectures. The algorithm can be motivated from a manifold learning and information theoretic perspective or from a generative model perspective. Comparative experiments clearly show the surprising advantage of corrupting the input of autoencoders on a pattern classification benchmark suite.

...read moreread less

9,004 citations

Book Chapter•10.1007/978-3-540-87479-9_3•

Data Clustering: 50 Years Beyond K-means

[...]

Anil K. Jain¹•Institutions (1)

Michigan State University¹

15 Sep 2008

TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.

...read moreread less

Abstract: The practice of classifying objects according to perceived similarities is the basis for much of science. Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms in to taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes cluster analysis (unsupervised learning) from discriminant analysis (supervised learning). The objective of cluster analysis is to simply find a convenient and valid organization of the data, not to establish rules for separating future data into categories.

...read moreread less

6,706 citations

Journal Article•10.1007/S11263-007-0122-4•

Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

[...]

Juan Carlos Niebles¹, Hongcheng Wang, Li Fei-Fei²•Institutions (2)

Universidad del Norte, Colombia¹, Princeton University²

01 Sep 2008-International Journal of Computer Vision

TL;DR: A novel unsupervised learning method for human action categories that can recognize and localize multiple actions in long and complex video sequences containing multiple motions.

...read moreread less

Abstract: We present a novel unsupervised learning method for human action categories. A video sequence is represented as a collection of spatial-temporal words by extracting space-time interest points. The algorithm automatically learns the probability distributions of the spatial-temporal words and the intermediate topics corresponding to human action categories. This is achieved by using latent topic models such as the probabilistic Latent Semantic Analysis (pLSA) model and Latent Dirichlet Allocation (LDA). Our approach can handle noisy feature points arisen from dynamic background and moving cameras due to the application of the probabilistic models. Given a novel video sequence, the algorithm can categorize and localize the human action(s) contained in the video. We test our algorithm on three challenging datasets: the KTH human motion dataset, the Weizmann human action dataset, and a recent dataset of figure skating actions. Our results reflect the promise of such a simple approach. In addition, our algorithm can recognize and localize multiple actions in long and complex video sequences containing multiple motions.

...read moreread less

1,843 citations

Journal Article•10.1007/S00500-008-0323-Y•

KEEL: a software tool to assess evolutionary algorithms for data mining problems

[...]

Jesús Alcalá-Fdez¹, Luciano Sánchez², Salvador García¹, M. J. del Jesus³, Sebastián Ventura⁴, Josep Maria Garrell⁵, José Otero², Cristóbal Romero⁴, Jaume Bacardit⁶, Víctor M. Rivas³, Juan Carlos Fernández⁴, Francisco Herrera¹ - Show less +8 more•Institutions (6)

University of Granada¹, University of Oviedo², University of Jaén³, University of Córdoba (Spain)⁴, Ramon Llull University⁵, University of Nottingham⁶

15 Oct 2008

TL;DR: KEEL as discussed by the authors is a software tool to assess evolutionary algorithms for data mining problems of various kinds including regression, classification, unsupervised learning, etc., which includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL.

...read moreread less

Abstract: This paper introduces a software tool named KEEL which is a software tool to assess evolutionary algorithms for Data Mining problems of various kinds including as regression, classification, unsupervised learning, etc. It includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL, as well as the integration of evolutionary learning techniques with different pre-processing techniques, allowing it to perform a complete analysis of any learning model in comparison to existing software tools. Moreover, KEEL has been designed with a double goal: research and educational.

...read moreread less

1,522 citations

Journal Article•10.1016/J.NEUNET.2008.02.003•

2008 Special Issue: Reinforcement learning of motor skills with policy gradients

[...]

Jan Peters¹, Stefan Schaal¹•Institutions (1)

University of Southern California¹

01 May 2008-Neural Networks

TL;DR: This paper examines learning of complex motor skills with human-like limbs, and combines the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning with the theory of stochastic policy gradient learning.

...read moreread less

1,151 citations

Journal Article•10.1198/JASA.2008.S236•

Pattern Recognition and Machine Learning

[...]

Thomas Burr¹•Institutions (1)

Los Alamos National Laboratory¹

01 Jun 2008-Journal of the American Statistical Association

TL;DR: In this paper, Pattern Recognition and Machine Learning (PRML) is used for pattern recognition and machine learning in the context of statistical data. Journal of the American Statistical Association: Vol. 103, No. 482, pp. 886-887

...read moreread less

Abstract: (2008). Pattern Recognition and Machine Learning. Journal of the American Statistical Association: Vol. 103, No. 482, pp. 886-887.

...read moreread less

1,095 citations

Proceedings Article•

Natural Image Denoising with Convolutional Networks

[...]

Viren Jain¹, Sebastian Seung¹•Institutions (1)

Massachusetts Institute of Technology¹

8 Dec 2008

TL;DR: An approach to low-level vision is presented that combines the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models to avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference.

...read moreread less

Abstract: We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind de-noising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters.

...read moreread less

1,034 citations

Journal Article•10.1162/NECO.2008.04-07-510•

Representational power of restricted boltzmann machines and deep belief networks

[...]

Nicolas Le Roux¹, Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

01 Jun 2008-Neural Computation

TL;DR: This work proves that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions and suggests a new and less greedy criterion for training RBMs within DBNs.

...read moreread less

Abstract: Deep belief networks (DBN) are generative neural network models with many layers of hidden explanatory factors, recently introduced by Hinton, Osindero, and Teh (2006) along with a greedy layer-wise unsupervised learning algorithm. The building block of a DBN is a probabilistic model called a restricted Boltzmann machine (RBM), used to represent one layer of the model. Restricted Boltzmann machines are interesting because inference is easy in them and because they have been successfully used as building blocks for training deeper models. We first prove that adding hidden units yields strictly improved modeling power, while a second theorem shows that RBMs are universal approximators of discrete distributions. We then study the question of whether DBNs with more layers are strictly more powerful in terms of representational power. This suggests a new and less greedy criterion for training RBMs within DBNs.

...read moreread less

996 citations

Journal Article•10.1109/TKDE.2007.190672•

Label Propagation through Linear Neighborhoods

[...]

Fei Wang¹, Changshui Zhang¹•Institutions (1)

Tsinghua University¹

01 Jan 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood, and can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness.

...read moreread less

Abstract: In many practical data mining applications such as text classification, unlabeled training examples are readily available, but labeled ones are fairly expensive to obtain. Therefore, semi supervised learning algorithms have aroused considerable interests from the data mining and machine learning fields. In recent years, graph-based semi supervised learning has been becoming one of the most active research areas in the semi supervised learning community. In this paper, a novel graph-based semi supervised learning approach is proposed based on a linear neighborhood model, which assumes that each data point can be linearly reconstructed from its neighborhood. Our algorithm, named linear neighborhood propagation (LNP), can propagate the labels from the labeled points to the whole data set using these linear neighborhoods with sufficient smoothness. A theoretical analysis of the properties of LNP is presented in this paper. Furthermore, we also derive an easy way to extend LNP to out-of-sample data. Promising experimental results are presented for synthetic data, digit, and text classification tasks.

...read moreread less

980 citations

Proceedings Article•

Unsupervised Learning of Narrative Event Chains

[...]

Nathanael Chambers¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

1 Jun 2008

TL;DR: A three step process to learning narrative event chains using unsupervised distributional methods to learn narrative relations between events sharing coreferring arguments and introduces two evaluations: the narrative cloze to evaluate event relatedness, and an order coherence task to evaluate narrative order.

...read moreread less

Abstract: Hand-coded scripts were used in the 1970-80s as knowledge backbones that enabled inference and other NLP tasks requiring deep semantic knowledge. We propose unsupervised induction of similar schemata called narrative event chains from raw newswire text. A narrative event chain is a partially ordered set of events related by a common protagonist. We describe a three step process to learning narrative event chains. The first uses unsupervised distributional methods to learn narrative relations between events sharing coreferring arguments. The second applies a temporal classifier to partially order the connected events. Finally, the third prunes and clusters self-contained chains from the space of events. We introduce two evaluations: the narrative cloze to evaluate event relatedness, and an order coherence task to evaluate narrative order. We show a 36% improvement over baseline for narrative prediction and 25% for temporal coherence.

...read moreread less

724 citations

A Tutorial on Learning with Bayesian Networks.

[...]

David Heckerman

1 Jan 2008

TL;DR: In this paper, the authors discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models, including techniques for learning with incomplete data.

...read moreread less

Abstract: A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the overfitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning. We illustrate the graphical-modeling approach using a real-world case study.

...read moreread less

Proceedings Article•10.1145/1390156.1390169•

An empirical evaluation of supervised learning in high dimensions

[...]

Rich Caruana¹, Nikos Karampatziakis¹, Ainur Yessenalina¹•Institutions (1)

Cornell University¹

5 Jul 2008

TL;DR: To the surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs, and the effect of increasing dimensionality on the performance of the learning algorithms changes.

...read moreread less

Abstract: In this paper we perform an empirical evaluation of supervised learning on high-dimensional data. We evaluate performance on three metrics: accuracy, AUC, and squared loss and study the effect of increasing dimensionality on the performance of the learning algorithms. Our findings are consistent with previous studies for problems of relatively low dimension, but suggest that as dimensionality increases the relative performance of the learning algorithms changes. To our surprise, the method that performs consistently well across all dimensions is random forests, followed by neural nets, boosted trees, and SVMs.

...read moreread less

Book Chapter•10.1007/978-3-319-03194-1_4•

Policy Search for Motor Primitives in Robotics

[...]

Jens Kober¹, Jan Peters¹•Institutions (1)

Max Planck Society¹

8 Dec 2008

TL;DR: This paper extends previous work on policy learning from the immediate reward case to episodic reinforcement learning, resulting in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives.

...read moreread less

Abstract: Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM™ robot arm.

...read moreread less

Journal Article•10.1016/J.ENVSOFT.2007.10.001•

Review of the self-organizing map (SOM) approach in water resources: Analysis, modelling and application

[...]

Aman Mohammad Kalteh¹, Peder Hjorth¹, Ronny Berndtsson¹•Institutions (1)

Lund University¹

01 Jul 2008-Environmental Modelling and Software

TL;DR: It is concluded that SOM is a promising technique suitable to investigate, model, and control many types of water resources processes and systems.

...read moreread less

Abstract: The use of artificial neural networks (ANNs) in problems related to water resources has received steadily increasing interest over the last decade or so. The related method of the self-organizing map (SOM) is an unsupervised learning method to analyze, cluster, and model various types of large databases. There is, however, still a notable lack of comprehensive literature review for SOM along with training and data handling procedures, and potential applicability. Consequently, the present paper aims firstly to explain the algorithm and secondly, to review published applications with main emphasis on water resources problems in order to assess how well SOM can be used to solve a particular problem. It is concluded that SOM is a promising technique suitable to investigate, model, and control many types of water resources processes and systems. Unsupervised learning methods have not yet been tested fully in a comprehensive way within, for example water resources engineering. However, over the years, SOM has displayed a steady increase in the number of applications in water resources due to the robustness of the method.

...read moreread less

Journal Article•10.1198/JASA.2008.S219•

Gaussian Processes for Machine Learning

[...]

Songthip T. Ounpraseuth¹•Institutions (1)

University of Arkansas for Medical Sciences¹

01 Mar 2008-Journal of the American Statistical Association

TL;DR: In this article, Gaussian Processes for Machine Learning (GPML) are used for machine learning tasks in the context of statistical data mining, and the authors propose a Gaussian process for statistical data collection.

...read moreread less

Abstract: (2008). Gaussian Processes for Machine Learning. Journal of the American Statistical Association: Vol. 103, No. 481, pp. 429-429.

...read moreread less

Journal Article•10.1109/TASL.2007.909282•

Unsupervised Pattern Discovery in Speech

[...]

Alex Park¹, James Glass¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2008-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream by exploiting the structure of repeating patterns within the speech signal.

...read moreread less

Abstract: We present a novel approach to speech processing based on the principle of pattern discovery. Our work represents a departure from traditional models of speech recognition, where the end goal is to classify speech into categories defined by a prespecified inventory of lexical units (i.e., phones or words). Instead, we attempt to discover such an inventory in an unsupervised manner by exploiting the structure of repeating patterns within the speech signal. We show how pattern discovery can be used to automatically acquire lexical entities directly from an untranscribed audio stream. Our approach to unsupervised word acquisition utilizes a segmental variant of a widely used dynamic programming technique, which allows us to find matching acoustic patterns between spoken utterances. By aggregating information about these matching patterns across audio streams, we demonstrate how to group similar acoustic sequences together to form clusters corresponding to lexical entities such as words and short multiword phrases. On a corpus of academic lecture material, we demonstrate that clusters found using this technique exhibit high purity and that many of the corresponding lexical identities are relevant to the underlying audio stream.

...read moreread less

Proceedings Article•10.1145/1390156.1390311•

Improved Nyström low-rank approximation and error analysis

[...]

Kai Zhang¹, Ivor W. Tsang¹, James T. Kwok¹•Institutions (1)

Hong Kong University of Science and Technology¹

5 Jul 2008

TL;DR: An error analysis that directly relates the Nyström approximation quality with the encoding powers of the landmark points in summarizing the data is presented, and the resultant error bound suggests a simple and efficient sampling scheme, the k-means clustering algorithm, for NyStröm low-rank approximation.

...read moreread less

Abstract: Low-rank matrix approximation is an effective tool in alleviating the memory and computational burdens of kernel methods and sampling, as the mainstream of such algorithms, has drawn considerable attention in both theory and practice. This paper presents detailed studies on the Nystrom sampling scheme and in particular, an error analysis that directly relates the Nystrom approximation quality with the encoding powers of the landmark points in summarizing the data. The resultant error bound suggests a simple and efficient sampling scheme, the k-means clustering algorithm, for Nystrom low-rank approximation. We compare it with state-of-the-art approaches that range from greedy schemes to probabilistic sampling. Our algorithm achieves significant performance gains in a number of supervised/unsupervised learning tasks including kernel PCA and least squares SVM.

...read moreread less

Posted Content•

Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

[...]

Francis Bach¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

09 Sep 2008-arXiv: Learning

TL;DR: The extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

...read moreread less

Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the l1-norm or the block l1-norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.

...read moreread less

Journal Article•10.1016/J.YMSSP.2007.07.003•

ARMA modelled time-series classification for structural health monitoring of civil infrastructure

[...]

E. Peter Carden¹, James M. W. Brownjohn¹•Institutions (1)

University of Sheffield¹

01 Feb 2008-Mechanical Systems and Signal Processing

TL;DR: In this paper, a statistical classification algorithm is presented based on analysis of a structure's response in the time domain, which is fitted with Autoregressive Moving Average (ARMA) models and the ARMA coefficients are fed to the classifier.

...read moreread less

Proceedings Article•10.1109/WMVC.2008.4544068•

Spatial-Temporal correlatons for unsupervised action classification

[...]

Silvio Savarese¹, Andrey DelPozo¹, Juan Carlos Niebles², Li Fei-Fei³•Institutions (3)

University of Illinois at Urbana–Champaign¹, Universidad del Norte, Colombia², Princeton University³

8 Jan 2008

TL;DR: This paper proposes the usage of spatial-temporal correlograms to encode flexible long range temporal information into the spatial- Temporal motion features, and applies an unsupervised generative model to learn different classes of human actions from these ST-correlograms.

...read moreread less

Abstract: Spatial-temporal local motion features have shown promising results in complex human action classification. Most of the previous works [6],[16],[21] treat these spatial- temporal features as a bag of video words, omitting any long range, global information in either the spatial or temporal domain. Other ways of learning temporal signature of motion tend to impose a fixed trajectory of the features or parts of human body returned by tracking algorithms. This leaves little flexibility for the algorithm to learn the optimal temporal pattern describing these motions. In this paper, we propose the usage of spatial-temporal correlograms to encode flexible long range temporal information into the spatial-temporal motion features. This results into a much richer description of human actions. We then apply an unsupervised generative model to learn different classes of human actions from these ST-correlograms. KTH dataset, one of the most challenging and popular human action dataset, is used for experimental evaluation. Our algorithm achieves the highest classification accuracy reported for this dataset under an unsupervised learning scheme.

...read moreread less

Journal Article•10.1016/J.PATCOG.2007.10.012•

Feature extraction for classification problems and its application to face recognition

[...]

Nojun Kwak¹•Institutions (1)

Ajou University¹

01 May 2008-Pattern Recognition

TL;DR: The experimental results show that the proposed method performs well for face recognition problems, compared with conventional methods such as the principal component analysis (PCA), Fisher's linear discriminant (FLD), etc.

...read moreread less

Book Chapter•10.1007/978-3-540-75171-7_3•

Unsupervised Learning and Clustering

[...]

Derek Greene¹, Pádraig Cunningham¹, Rudolf Mayer²•Institutions (2)

University College Dublin¹, Vienna University of Technology²

1 Jan 2008

TL;DR: This chapter begins with a review of the classic clustering techniques of k-means clustering and hierarchical clustering, and modern advances in clustering are covered with an analysis of kernel-based clusters and spectral clustering.

...read moreread less

Abstract: Unsupervised learning is very important in the processing of multimedia content as clustering or partitioning of data in the absence of class labels is often a requirement. This chapter begins with a review of the classic clustering techniques of k-means clustering and hierarchical clustering. Modern advances in clustering are covered with an analysis of kernel-based clustering and spectral clustering. One of the most popular unsupervised learning techniques for processing multimedia content is the self-organizing map, so a review of self-organizing maps and variants is presented in this chapter. The absence of class labels in unsupervised learning makes the question of evaluation and cluster quality assessment more complicated than in supervised learning. So this chapter also includes a comprehensive analysis of cluster validity assessment techniques.

...read moreread less

Proceedings Article•

Cluster Ensemble Selection.

[...]

Xiaoli Z. Fern¹, Wei Lin¹•Institutions (1)

Oregon State University¹

1 Jan 2008

TL;DR: This paper designs three ensemble selection methods based on quality and diversity, the two factors that have been shown to influence cluster ensemble performance, and empirically evaluated their performances.

...read moreread less

Abstract: This paper studies the ensemble selection problem for unsupervised learning. Given a large library of different clustering solutions, our goal is to select a subset of solutions to form a smaller yet better-performing cluster ensemble than using all available solutions. We design our ensemble selection methods based on quality and diversity, the two factors that have been shown to influence cluster ensemble performance. Our investigation revealed that using quality or diversity alone may not consistently achieve improved performance. Based on our observations, we designed three different selection approaches that jointly consider these two factors. We empirically evaluated their performance in comparison with both full ensembles and a random selection strategy. Our results indicate that by explicitly considering both quality and diversity in ensemble selection, we can achieve statistically significant performance improvement over full ensembles. Copyright © 2008 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 000-000, 2008

...read moreread less

Journal Article•10.1162/COLI.2008.07-028-R2-05-82•

Hybrid reinforcement/supervised learning of dialogue policies from fixed data sets

[...]

James Henderson¹, James Henderson², Oliver Lemon¹, Oliver Lemon², Kallirroi Georgila², Kallirroi Georgila¹ - Show less +2 more•Institutions (2)

University of Edinburgh¹, Battelle Memorial Institute²

01 Dec 2008-Computational Linguistics

TL;DR: This work proposes a hybrid model that combines reinforcement learning with supervised learning for dialogue management policies from a fixed data set, which outperforms a pure supervised learning model and a pure reinforcement learning model.

...read moreread less

Abstract: We propose a method for learning dialogue management policies from a fixed data set. The method addresses the challenges posed by Information State Update (ISU)-based dialogue systems, which represent the state of a dialogue as a large set of features, resulting in a very large state space and a huge policy space. To address the problem that any fixed data set will only provide information about small portions of these state and policy spaces, we propose a hybrid model that combines reinforcement learning with supervised learning. The reinforcement learning is used to optimize a measure of dialogue reward, while the supervised learning is used to restrict the learned policy to the portions of these spaces for which we have data. We also use linear function approximation to address the need to generalize from a fixed amount of data to large state spaces. To demonstrate the effectiveness of this method on this challenging task, we trained this model on the COMMUNICATOR corpus, to which we have added annotations for user actions and Information States. When tested with a user simulation trained on a different part of the same data set, our hybrid model outperforms a pure supervised learning model and a pure reinforcement learning model. It also outperforms the hand-crafted systems on the COMMUNICATOR data, according to automatic evaluation measures, improving over the average COMMUNICATOR system policy by 10%. The proposed method will improve techniques for bootstrapping and automatic optimization of dialogue management policies from limited initial data sets.

...read moreread less

Proceedings Article•10.3115/1613715.1613841•

Learning to Predict Code-Switching Points

[...]

Thamar Solorio¹, Yang Liu¹•Institutions (1)

University of Texas at Dallas¹

25 Oct 2008

TL;DR: Exploratory results on learning to predict potential code-switching points in Spanish-English are presented, using a transcription of code- Switched discourse to evaluate the performance of the classifiers.

...read moreread less

Abstract: Predicting possible code-switching points can help develop more accurate methods for automatically processing mixed-language text, such as multilingual language models for speech recognition systems and syntactic analyzers. We present in this paper exploratory results on learning to predict potential code-switching points in Spanish-English. We trained different learning algorithms using a transcription of code-switched discourse. To evaluate the performance of the classifiers, we used two different criteria: 1) measuring precision, recall, and F-measure of the predictions against the reference in the transcription, and 2) rating the naturalness of artificially generated code-switched sentences. Average scores for the code-switched sentences generated by our machine learning approach were close to the scores of those generated by humans.

...read moreread less

Journal Article•10.3389/NEURO.11.008.2008•

Modular toolkit for Data Processing (MDP): a Python data processing framework

[...]

Tiziano Zito, Niko Wilbert¹, Laurenz Wiskott¹, Pietro Berkes²•Institutions (2)

Humboldt University of Berlin¹, Brandeis University²

01 Jan 2008-Frontiers in Neuroinformatics

TL;DR: The modular toolkit for Data Processing is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures.

...read moreread less

Abstract: Modular toolkit for Data Processing (MDP) is a data processing framework written in Python. From the user's perspective, MDP is a collection of supervised and unsupervised learning algorithms and other data processing units that can be combined into data processing sequences and more complex feed-forward network architectures. Computations are performed efficiently in terms of speed and memory requirements. From the scientific developer's perspective, MDP is a modular framework, which can easily be expanded. The implementation of new algorithms is easy and intuitive. The new implemented units are then automatically integrated with the rest of the library. MDP has been written in the context of theoretical research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user's side, the variety of readily available algorithms, and the reusability of the implemented units make it also a useful educational tool.

...read moreread less

Journal Article•10.1016/J.NEUCOM.2007.12.027•

Delay learning and polychronization for reservoir computing

[...]

Hélène Paugam-Moisy¹, Regis Martinez¹, Samy Bengio²•Institutions (2)

Centre national de la recherche scientifique¹, Google²

01 Mar 2008-Neurocomputing

TL;DR: A multi-timescale learning rule for spiking neuron networks, in the line of the recently emerging field of reservoir computing, emphasizes that polychronization can be used as a tool for exploiting the computational power of synaptic delays and for monitoring the topology and activity of a spiking neurons network.

...read moreread less

Journal Article•10.1109/TNN.2007.905839•

Integrating Temporal Difference Methods and Self-Organizing Neural Networks for Reinforcement Learning With Delayed Evaluative Feedback

[...]

Ah-Hwee Tan¹, Ning Lu¹, Dan Xiao¹•Institutions (1)

Nanyang Technological University¹

01 Feb 2008-IEEE Transactions on Neural Networks

TL;DR: The proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals.

...read moreread less

Abstract: This paper presents a neural architecture for learning category nodes encoding mappings across multimodal patterns involving sensory inputs, actions, and rewards. By integrating adaptive resonance theory (ART) and temporal difference (TD) methods, the proposed neural model, called TD fusion architecture for learning, cognition, and navigation (TD-FALCON), enables an autonomous agent to adapt and function in a dynamic environment with immediate as well as delayed evaluative feedback (reinforcement) signals. TD-FALCON learns the value functions of the state-action space estimated through on-policy and off-policy TD learning methods, specifically state-action-reward-state-action (SARSA) and Q-learning. The learned value functions are then used to determine the optimal actions based on an action selection policy. We have developed TD-FALCON systems using various TD learning strategies and compared their performance in terms of task completion, learning speed, as well as time and space efficiency. Experiments based on a minefield navigation task have shown that TD-FALCON systems are able to learn effectively with both immediate and delayed reinforcement and achieve a stable performance in a pace much faster than those of standard gradient-descent-based reinforcement learning systems.

...read moreread less

Proceedings Article•10.1145/1390156.1390284•

Discriminative parameter learning for Bayesian networks

[...]

Jiang Su¹, Harry Zhang², Charles X. Ling³, Stan Matwin¹•Institutions (3)

University of Ottawa¹, University of New Brunswick², University of Western Ontario³

5 Jul 2008

TL;DR: A simple, efficient, and effective discriminative parameter learning method, called Discriminative Frequency Estimate (DFE), which learns parameters by discriminatively computing frequencies from data.

...read moreread less

Abstract: Bayesian network classifiers have been widely used for classification problems. Given a fixed Bayesian network structure, parameters learning can take two different approaches: generative and discriminative learning. While generative parameter learning is more efficient, discriminative parameter learning is more effective. In this paper, we propose a simple, efficient, and effective discriminative parameter learning method, called Discriminative Frequency Estimate (DFE), which learns parameters by discriminatively computing frequencies from data. Empirical studies show that the DFE algorithm integrates the advantages of both generative and discriminative learning: it performs as well as the state-of-the-art discriminative parameter learning method ELR in accuracy, but is significantly more efficient.

...read moreread less

Proceedings Article•10.1109/CIS.2008.204•

A Survey of Semi-Supervised Learning Methods

[...]

Nitin Pise¹, Parag Kulkarni•Institutions (1)

Maharashtra Institute of Technology¹

13 Dec 2008

TL;DR: Experimental results show that the hybrid algorithm gives better classification accuracy, and various important approaches to semi-supervised learning such as self-training, co-training(CO), expectation maximization (EM), CO-EM, and how graph-based methods are useful is explained.

...read moreread less

Abstract: In traditional machine learning approaches to classification, one uses only a labelled set to train the classifier. Labelled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labelled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy, it is of great interest both in theory and in practice. The paper discusses various important approaches to semi-supervised learning such as self-training, co-training(CO), expectation maximization (EM), CO-EM, Then how graph-based methods are useful is explained. All semi-supervised learning methods are classified into generative and discriminative methods. But experimental results show that the hybrid algorithm gives better classification accuracy.

...read moreread less

...

Expand