TL;DR: A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points.
Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.
TL;DR: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model, and methods to incorporate class priors and the predictions of classifiers obtained by supervised learning are discussed.
Abstract: An approach to semi-supervised learning is proposed that is based on a Gaussian random field model. Labeled and unlabeled data are represented as vertices in a weighted graph, with edge weights encoding the similarity between instances. The learning problem is then formulated in terms of a Gaussian random field on this graph, where the mean of the field is characterized in terms of harmonic functions, and is efficiently obtained using matrix methods or belief propagation. The resulting learning algorithms have intimate connections with random walks, electric networks, and spectral graph theory. We discuss methods to incorporate class priors and the predictions of classifiers obtained by supervised learning. We also propose a method of parameter learning by entropy minimization, and show the algorithm's ability to perform feature selection. Promising experimental results are presented for synthetic data, digit classification, and text classification tasks.
TL;DR: Locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data, is described and several extensions that enhance its performance are discussed.
Abstract: The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. Here we describe locally linear embedding (LLE), an unsupervised learning algorithm that computes low dimensional, neighborhood preserving embeddings of high dimensional data. The data, assumed to be sampled from an underlying manifold, are mapped into a single global coordinate system of lower dimensionality. The mapping is derived from the symmetries of locally linear reconstructions, and the actual computation of the embedding reduces to a sparse eigenvalue problem. Notably, the optimizations in LLE---though capable of generating highly nonlinear embeddings---are simple to implement, and they do not involve local minima. In this paper, we describe the implementation of the algorithm in detail and discuss several extensions that enhance its performance. We present results of the algorithm applied to data sampled from known manifolds, as well as to collections of images of faces, lips, and handwritten digits. These examples are used to provide extensive illustrations of the algorithm's performance---both successes and failures---and to relate the algorithm to previous and ongoing work in nonlinear dimensionality reduction.
TL;DR: A unified framework for extending Local Linear Embedding, Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling as well as for Spectral Clustering is provided.
Abstract: Several unsupervised learning algorithms based on an eigendecomposition provide either an embedding or a clustering only for given training points, with no straightforward extension for out-of-sample examples short of recomputing eigenvectors. This paper provides a unified framework for extending Local Linear Embedding (LLE), Isomap, Laplacian Eigenmaps, Multi-Dimensional Scaling (for dimensionality reduction) as well as for Spectral Clustering. This framework is based on seeing these algorithms as learning eigenfunctions of a data-dependent kernel. Numerical experiments show that the generalizations performed have a level of error comparable to the variability of the embedding algorithms due to the choice of training data.
TL;DR: In this article, a generalized nonlinear TAH learning rule was proposed that allows a balance between stability and sensitivity of learning, and the capacity of the system to learn patterns of correlations between afferent spike trains.
Abstract: Triggered by recent experimental results, temporally asymmetric Hebbian (TAH) plasticity is considered as a candidate model for the biological implementation of competitive synaptic learning, a key concept for the experience-based development of cortical circuitry. However, because of the well known positive feedback instability of correlation-based plasticity, the stability of the resulting learning process has remained a central problem. Plagued by either a runaway of the synaptic efficacies or a greatly reduced sensitivity to input correlations, the learning performance of current models is limited. Here we introduce a novel generalized nonlinear TAH learning rule that allows a balance between stability and sensitivity of learning. Using this rule, we study the capacity of the system to learn patterns of correlations between afferent spike trains. Specifically, we address the question of under which conditions learning induces spontaneous symmetry breaking and leads to inhomogeneous synaptic distributions that capture the structure of the input correlations. To study the efficiency of learning temporal relationships between afferent spike trains through TAH plasticity, we introduce a novel sensitivity measure that quantifies the amount of information about the correlation structure in the input, a learning rule capable of storing in the synaptic weights. We demonstrate that by adjusting the weight dependence of the synaptic changes in TAH plasticity, it is possible to enhance the synaptic representation of temporal input correlations while maintaining the system in a stable learning regime. Indeed, for a given distribution of inputs, the learning efficiency can be optimized.
TL;DR: This paper proposes to use adaptive hidden Markov models (HMM) to perform video-based face recognition and shows that the proposed algorithm results in better performance than using majority voting of image-based recognition results.
Abstract: While traditional face recognition is typically based on still images, face recognition from video sequences has become popular. In this paper, we propose to use adaptive hidden Markov models (HMM) to perform video-based face recognition. During the training process, the statistics of training video sequences of each subject, and the temporal dynamics, are learned by an HMM. During the recognition process, the temporal characteristics of the test video sequence are analyzed over time by the HMM corresponding to each subject. The likelihood scores provided by the HMMs are compared, and the highest score provides the identity of the test video sequence. Furthermore, with unsupervised learning, each HMM is adapted with the test video sequence, which results in better modeling over time. Based on extensive experiments with various databases, we show that the proposed algorithm results in better performance than using majority voting of image-based recognition results.
TL;DR: This chapter extends the stability-based validation of cluster structure, and proposes stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices.
Abstract: Clustering is one of the most commonly used tools in the analysis of gene expression data (1, 2) . The usage in grouping genes is based on the premise that co-expression is a result of co-regulation. It is thus a preliminary step in extracting gene networks and inference of gene function (3, 4) . Clustering of experiments can be used to discover novel phenotypic aspects of cells and tissues (3, 5, 6) , including sensitivity to drugs (7) , and can also detect artifacts of experimental conditions (8) . Clustering and its applications in biology are presented in greater detail in the chapter by Zhao and Karypis (see also (9) ). While we focus on gene expression data in this chapter, the methodology presented here is applicable for other types of data as well. Clustering is a form of unsupervised learning, i.e. no information on the class variable is assumed, and the objective is to find the “natural” groups in the data. However, most clustering algorithms generate a clustering even if the data has no inherent cluster structure, so external validation tools are required. Given a set of partitions of the data into an increasing number of clusters (e.g. by a hierarchical clustering algorithm, or k-means), such a validation tool will tell the user the number of clusters in the data (if any). Many methods have been proposed in the literature to address this problem (10–15) . Recent studies have shown the advantages of sampling-based methods (12, 14) . These methods are based on the idea that when a partition has captured the structure in the data, this partition should be stable with respect to perturbation of the data. Bittner et al. (16) used a similar approach to validate clusters representing gene expression of melanoma patients. The emergence of cluster structure depends on several choices: data representation and normalization, the choice of a similarity measure and clustering algorithm. In this chapter we extend the stability-based validation of cluster structure, and propose stability as a figure of merit that is useful for comparing clustering solutions, thus helping in making these choices. We use this framework to demonstrate the ability of Principal Component Analysis (PCA) to extract features relevant to the cluster structure. We use stability as a tool for simultaneously choosing the number of principal components and the number of clusters; we compare the performance of different similarity measures and normalization schemes. The approach is demonstrated through a case study of yeast gene expression data from Eisen et al. (1) . For yeast, a functional classification of a large number of genes is known, and we use this classification for validating the results produced by clustering. A method for comparing clustering solutions specifically applicable to gene expression data was introduced in (17) . However, it cannot be used to choose the number of clusters, and is not directly applicable in choosing the number of principal components. The results of clustering are easily corrupted by the addition of noise: even a few
TL;DR: It is suggested that the phasic and tonic components of dopamine neuron firing can encode the signal required for meta-learning of reinforcement learning.
TL;DR: This research work proposes the utilization of the unsupervised Hebbian algorithm to nonlinear units for training FCMs and proposes the proposed learning procedure, which modifies its fuzzy causal web as causal patterns change and as experts update their causal knowledge.
Abstract: Fuzzy Cognitive Map (FCM) is a soft computing technique for modeling systems. It combines synergistically the theories of neural networks and fuzzy logic. The methodology of developing FCMs is easily adaptable but relies on human experience and knowledge, and thus FCMs exhibit weaknesses and dependence on human experts. The critical dependence on the expert’s opinion and knowledge, and the potential convergence to undesired steady states are deficiencies of FCMs. In order to overcome these deficiencies and improve the efficiency and robustness of FCM a possible solution is the utilization of learning methods. This research work proposes the utilization of the unsupervised Hebbian algorithm to nonlinear units for training FCMs. Using the proposed learning procedure, the FCM modifies its fuzzy causal web as causal patterns change and as experts update their causal knowledge.
TL;DR: A new technique for training visual detectors which requires only a small quantity of labeled data, and then uses unlabeled data to improve performance over time is described, based on the cotraining framework of Blum and Mitchell.
Abstract: One significant challenge in the construction of visual detection systems is the acquisition of sufficient labeled data. We describe a new technique for training visual detectors which requires only a small quantity of labeled data, and then uses unlabeled data to improve performance over time. Unsupervised improvement is based on the cotraining framework of Blum and Mitchell, in which two disparate classifiers are trained simultaneously. Unlabeled examples which are confidently labeled by one classifier are added, with labels, to the training set of the other classifier. Experiments are presented on the realistic task of automobile detection in roadway surveillance video. In this application, cotraining reduces the false positive rate by a factor of 2 to 11 from the classifier trained with labeled data alone.
TL;DR: This work proposes a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network.
Abstract: There is an ongoing debate over the capabilities of hierarchical neural feedforward architectures for performing real-world invariant object recognition. Although a variety of hierarchical models exists, appropriate supervised and unsupervised learning methods are still an issue of intense research. We propose a feedforward model for recognition that shares components like weight sharing, pooling stages, and competitive nonlinearities with earlier approaches but focuses on new methods for learning optimal feature-detecting cells in intermediate stages of the hierarchical network. We show that principles of sparse coding, which were previously mostly applied to the initial feature detection stages, can also be employed to obtain optimized intermediate complex features. We suggest a new approach to optimize the learning of sparse features under the constraints of a weight-sharing or convolutional architecture that uses pooling operations to achieve gradual invariance in the feature hierarchy. The approach explicitly enforces symmetry constraints like translation invariance on the feature set. This leads to a dimension reduction in the search space of optimal features and allows determining more efficiently the basis representatives, which achieve a sparse decomposition of the input. We analyze the quality of the learned feature representation by investigating the recognition performance of the resulting hierarchical network on object and face databases. We show that a hierarchy with features learned on a single object data set can also be applied to face recognition without parameter changes and is competitive with other recent machine learning recognition approaches. To investigate the effect of the interplay between sparse coding and processing nonlinearities, we also consider alternative feedforward pooling nonlinearities such as presynaptic maximum selection and sum-of-squares integration. The comparison shows that a combination of strong competitive nonlinearities with sparse coding offers the best recognition performance in the difficult scenario of segmentation-free recognition in cluttered surround. We demonstrate that for both learning and recognition, a precise segmentation of the objects is not necessary.
TL;DR: The results indicate the power of the method to determine a meaningful user context model while only requiring data from a comfortable physiological sensor device.
Abstract: Context-aware computing describes the situationwhere a wearable / mobile computer is aware of itsuser's state and surroundings and modifies its behaviorbased on this information. We designed, implemented andevaluated a wearable system which can determine typicaluser context and context transition probabilities onlineand without external supervision. The system relies ontechniques from machine learning, statistical analysisand graph algorithms. It can be used for onlineclassification and prediction. Our results indicate thepower of our method to determine a meaningful usercontext model while only requiring data from acomfortable physiological sensor device.
TL;DR: It is argued that the use of SVMs, particularly in combination with the kernel trick, can make it easier to apply reinforcement learning as an "out-of-the-box" technique, without extensive feature engineering.
Abstract: The basic tools of machine learning appear in the inner loop of most reinforcement learning algorithms, typically in the form of Monte Carlo methods or function approximation techniques. To a large extent, however, current reinforcement learning algorithms draw upon machine learning techniques that are at least ten years old and, with a few exceptions, very little has been done to exploit recent advances in classification learning for the purposes of reinforcement learning. We use a variant of approximate policy iteration based on rollouts that allows us to use a pure classification learner, such as a support vector machine (SVM), in the inner loop of the algorithm. We argue that the use of SVMs, particularly in combination with the kernel trick, can make it easier to apply reinforcement learning as an "out-of-the-box" technique, without extensive feature engineering. Our approach opens the door to modern classification methods, but does not preclude the use of classical methods. We present experimental results in the pendulum balancing and bicycle riding domains using both SVMs and neural networks for classifiers.
TL;DR: The experimental results show that the proposed competition learning approach to coreference resolution can outperform those based on the single-candidate model and applies a candidate filter to reduce the computational cost and data noises during training and resolution.
Abstract: In this paper we propose a competition learning approach to coreference resolution. Traditionally, supervised machine learning approaches adopt the single-candidate model. Nevertheless the preference relationship between the antecedent candidates cannot be determined accurately in this model. By contrast, our approach adopts a twin-candidate learning model. Such a model can present the competition criterion for antecedent candidates reliably, and ensure that the most preferred candidate is selected. Furthermore, our approach applies a candidate filter to reduce the computational cost and data noises during training and resolution. The experimental results on MUC-6 and MUC-7 data set show that our approach can outperform those based on the single-candidate model.
TL;DR: This paper proposes three theoretical methods for taking into account this distribution P(x) for regularization and provides links to existing graph-based semi-supervised learning algorithms.
Abstract: We address in this paper the question of how the knowledge of the marginal distribution P(x) can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.
TL;DR: The use of a multiobjective EA (NSGA-II) has enabled a smaller gene subset size to correctly classify 100% or near 100% samples for three cancer samples and introduced a prediction strength threshold for determining a sample's belonging to one class or the other.
Abstract: In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.
TL;DR: A methodology for feature selection in unsupervisedlearning makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures the quality of clusters have been used to guide the search toward more discriminant features and the best number of clusters.
Abstract: In this paper a methodology for feature selection in unsupervisedlearning is proposed. It makes use of a multi-objectivegenetic algorithm where the minimization of thenumber of features and a validity index that measures thequality of clusters have been used to guide the search towardsthe more discriminant features and the best numberof clusters. The proposed strategy is evaluated usingtwo synthetic data sets and then it is applied to handwrittenmonth word recognition. Comprehensive experimentsdemonstrate the feasibility and efficiency of the proposedmethodology.
TL;DR: Although the robot has never seen or programmed to interpret human arm movement, and the detail of visual stimuli are very different, the robot identifies some of the patterns as similar to those in self learning, and responded by generating the previously learned arm movement.
Abstract: Behavior imitation ability will be a key technology for future human friendly robots. In order to understand the principles and mechanisms of imitation, we take a synthetic cognitive developmental approach, starting with minimum components and create a system that can learn to imitate others. We developed a visuo-motor neural learning system which consists of orientation selective visual movement representation, distributed arm movement representation, and a high-dimensional temporal sequence learning mechanism. The vision and the movement representations model the findings in primate brain, i.e. macaque area MT(or human area V5) and the primary motor area. The learning mechanism is inspired by the finding that there are excessive connections in neonate brain. As our robot explores the visuo-motor self movement patterns, it learns coherent patterns as high-dimensional trajectory attractors. After the learning, a human comes in front of the robot showing arm movements which are similar to the ones in self learning. Although the robot has never seen or programmed to interpret human arm movement, and the detail of visual stimuli are very different, the robot identifies some of the patterns as similar to those in self learning, and responded by generating the previously learned arm movement. In other words, the robot exhibits early imitation ability based on self exploratory learning.
TL;DR: A new AIS based clustering approach (TECNO-STREAMS) is proposed that addresses the weaknesses of current AIS models and exhibits superior learning abilities, while at the same time, requiring low memory and computational costs.
Abstract: Artificial immune system (AIS) models hold many promises in the field of unsupervised learning. However, existing models are not scalable, which makes them of limited use in data mining. We propose a new AIS based clustering approach (TECNO-STREAMS) that addresses the weaknesses of current AIS models. Compared to existing AIS based techniques, our approach exhibits superior learning abilities, while at the same time, requiring low memory and computational costs. Like the natural immune system, the strongest advantage of immune based learning compared to other approaches is expected to be its ease of adaptation to the dynamic environment that characterizes several applications, particularly in mining data streams. We illustrate the ability of the proposed approach in detecting clusters in noisy data sets, and in mining evolving user profiles from Web clickstream data in a single pass. TECNO-STREAMS adheres to all the requirements of clustering data streams: compactness of representation, fast incremental processing of new data points, and clear and fast identification of outliers.
TL;DR: Experiments show that the learned tracker performs much better than existing trackers on the tracking of complex non-rigid motions such as fish twisting with self-occlusion and large inter-frame lip motion.
Abstract: In this paper, a novel method to learn the intrinsic object structure for robust visual tracking is proposed. The basic assumption is that the parameterized object state lies on a low dimensional manifold and can be learned from training data. Based on this assumption, firstly we derived the dimensionality reduction and density estimation algorithm for unsupervised learning of object intrinsic representation, the obtained non-rigid part of object state reduces even to 2 dimensions. Secondly the dynamical model is derived and trained based on this intrinsic representation. Thirdly the learned intrinsic object structure is integrated into a particle-filter style tracker. We will show that this intrinsic object representation has some interesting properties and based on which the newly derived dynamical model makes particle-filter style tracker more robust and reliable. Experiments show that the learned tracker performs much better than existing trackers on the tracking of complex non-rigid motions such as fish twisting with self-occlusion and large inter-frame lip motion. The proposed method also has the potential to solve other type of tracking problems.
TL;DR: An algorithm for unsupervised learning and semantic classiflcation of names and terms, given a small number of seed examples and an unlabeled training corpus, that learns patterns that identify more examples, in a bootstrapping cycle.
Abstract: We present an algorithm for unsupervised learning and semantic classiflcation of names and terms. Given a small number of seed examples and an unlabeled training corpus, the algorithm learns patterns that identify more examples, in a bootstrapping cycle. Multiple classes are learned simultaneously, including negative classes that serve to provide negative examples for the target classes. We apply the algorithm to texts from several domains, in English and Chinese.
TL;DR: This work presents an unsupervised link discovery method aimed at discovering unusual, interestingly linked entities in multi-relational datasets.
Abstract: A significant portion of knowledge discovery and data mining research focuses on finding patterns of interest in data. Once a pattern is found, it can be used to recognize satisfying instances. The new area of link discovery requires a complementary approach, since patterns of interest might not yet be known or might have too few examples to be learnable. We present an unsupervised link discovery method aimed at discovering unusual, interestingly linked entities in multi-relational datasets. Various notions of rarity are introduced to measure the "interestingness" of sets of paths and entities. These measurements have been implemented and applied to a real-world bibliographic dataset where they give very promising results.
TL;DR: In this article, the authors examined the use of machine learning to improve a rooftop detection process, one step in a vision system that recognizes buildings in overhead imagery, using ROC analysis to evaluate the methods under varying error costs.
Abstract: In this paper, we examine the use of machine learning to improve a rooftop detection process, one step in a vision system that recognizes buildings in overhead imagery. We review the problem of analyzing aerial images and describe an existing system that detects buildings in such images. We briefly review four algorithms that we selected to improve rooftop detection. The data sets were highly skewed and the cost of mistakes differed between the classes, so we used ROC analysis to evaluate the methods under varying error costs. We report three experiments designed to illuminate facets of applying machine learning to the image analysis task. One investigated learning with all available images to determine the best performing method. Another focused on within-image learning, in which we derived training and testing data from the same image. A final experiment addressed between-image learning, in which training and testing sets came from different images. Results suggest that useful generalization occurred when training and testing on data derived from images differing in location and in aspect. They demonstrate that under most conditions, naive Bayes exceeded the accuracy of other methods and a handcrafted classifier, the solution currently used in the building detection system.
TL;DR: This article widens the application domain of the taxonomy for supervised STCNs recently proposed by Kremer (2001) to the unsupervised case, and argues that the proposed approach is simple and more powerful than the previous attempts from a descriptive and predictive viewpoint.
Abstract: Spatiotemporal connectionist networks (STCNs) comprise an important class of neural models that can deal with patterns distributed in both time and space. In this article, we widen the application domain of the taxonomy for supervised STCNs recently proposed by Kremer (2001) to the unsupervised case. This is possible through a reinterpretation of the state vector as a vector of latent (hidden) variables, as proposed by Meinicke (2000). The goal of this generalized taxonomy is then to provide a non-linear generative framework for describing unsupervised spatiotemporal networks, making it easier to compare and contrast their representational and operational characteristics. Computational properties, representational issues, and learning are also discussed, and a number of references to the relevant source publications are provided. It is argued that the proposed approach is simple and more powerful than the previous attempts from a descriptive and predictive viewpoint. We also discuss the relation of this taxonomy with automata theory and state-space modeling and suggest directions for further work.
TL;DR: A method to combine active and unsupervised learning for automatic speech recognition (ASR) to minimize the human supervision for training acoustic and language models and to maximize the performance given the transcribed and untranscribed data.
Abstract: State-of-the-art speech recognition systems are trained using human transcriptions of speech utterances. In this paper, we describe a method to combine active and unsupervised learning for automatic speech recognition (ASR). The goal is to minimize the human supervision for training acoustic and language models and to maximize the performance given the transcribed and untranscribed data. Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function. For unsupervised learning, we utilize the remaining untranscribed data by using their ASR output and word confidence scores. Our experiments show that the amount of labeled data needed for a given word accuracy can be reduced by 75% by combining active and unsupervised learning.
TL;DR: The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs with an approach to exploiting past learning experiences that focuses on statistics (mean and deviation) about the agent's value tables.
Abstract: In this paper we address a new problem in reinforcement learning. Here we consider an agent that faces multiple learning tasks within its lifetime. The agent's objective is to maximize its total reward in the lifetime as well as a conventional return in each task. To realize this, it has to be endowed an important ability to keep its past learning experiences and utilize them for improving future learning performance. This time we try to phrase this problem formally. The central idea is to introduce an environmental class, BV-MDPs that is defined with the distribution of MDPs. As an approach to exploiting past learning experiences, we focus on statistics (mean and deviation) about the agent's value tables. The mean can be used as initial values of the table when a new task is presented. The deviation can be viewed as measuring reliability of the mean, and we utilize it in calculating priority of simulated backups. We conduct experiments in computer simulation to evaluate the effectiveness.
TL;DR: An empirical study on 7 individual learning systems and 9 different combined methods on 4 different biological data sets is performed, and some suggested issues to be considered when answering the following questions are provided.
Abstract: Research in bioinformatics is driven by the experimental data. Current biological databases are populated by vast amounts of experimental data. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. At present, with various learning algorithms available in the literature, researchers are facing difficulties in choosing the best method that can apply to their data. We performed an empirical study on 7 individual learning systems and 9 different combined methods on 4 different biological data sets, and provide some suggested issues to be considered when answering the following questions: (i) How does one choose which algorithm is best suitable for their data set? (ii) Are combined methods better than a single approach? (iii) How does one compare the effectiveness of a particular algorithm to the others?
TL;DR: This paper proposes a new approach to feature selection based on a modified fuzzy C-means algorithm with supervision (MFCMS), and shows that feature selection performed by MFCMS achieved an improvement in generalization on all data sets.
TL;DR: This thesis categorizes current MI methods into a new framework, and proposes a new assumption for MI learning, called the “collective assumption”, which states that “An example is positive if at least one of its instances is positive and negative otherwise”.
Abstract: Multiple instance (MI) learning is a relatively new topic in machine learning. It is concerned with supervised learning but differs from normal supervised learning in two points: (1) it has multiple instances in an example (and there is only one instance in an example in standard supervised learning), and (2) only one class label is observable for all the instances in an example (whereas each instance has its own class label in normal supervised learning). In MI learning there is a common assumption regarding the relationship between the class label of an example and the “unobservable” class labels of the instances inside it. This assumption, which is called the “MI assumption” in this thesis, states that “An example is positive if at least one of its instances is positive and negative otherwise”. In this thesis, we first categorize current MI methods into a new framework. According to our analysis, there are two main categories of MI methods, instancebased and metadata-based approaches. Then we propose a new assumption for MI learning, called the “collective assumption”. Although this assumption has been used in some previous MI methods, it has never been explicitly stated,1 and this is the first time that it is formally specified. Using this new assumption we develop new algorithms — more specifically two instance-based and one metadata-based methods. All of these methods build probabilistic models and thus implement statistical learning algorithms. The exact generative models underlying these methods are explicitly stated and illustrated so that one may clearly understand the situations 1As a matter of fact, for some of these methods, it is actually claimed that they use the standard MI assumption stated above.
TL;DR: In this article, a task decomposition, in the form of a decision tree, is provided for simulated robotic soccer keepaway, and two different methods of learning the resulting subtasks are investigated.
Abstract: In some complex control tasks, learning a direct mapping from an agent's sensors to its actuators is very difficult. For such tasks, decomposing the problem into more manageable components can make learning feasible. In this paper, we provide a task decomposition, in the form of a decision tree, for one such task. We investigate two different methods of learning the resulting subtasks. The first approach, layered learning, trains each component sequentially in its own training environment, aggressively constraining the search. The second approach, coevolution, learns all the subtasks simultaneously from the same experiences and puts few restrictions on the learning algorithm. We empirically compare these two training methodologies using neuro-evolution, a machine learning algorithm that evolves neural networks. Our experiments, conducted in the domain of simulated robotic soccer keepaway, indicate that neuro-evolution can learn effective behaviors and that the less constrained coevolutionary approach outperforms the sequential approach. These results provide new evidence of coevolution's utility and suggest that solution spaces should not be over-constrained when supplementing the learning of complex tasks with human knowledge.