TL;DR: This paper presented an unsupervised learning algorithm for recognizing synonyms based on statistical data acquired by querying a web search engine, called Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words.
Abstract: This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
TL;DR: This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 Words to 500,000 words.
Abstract: This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 words to 500,000 words. We develop a set of heuristics that rapidly develop a probabilistic morphological grammar, and use MDL as our primary tool to determine whether the modifications proposed by the heuristics will be adopted or not. The resulting grammar matches well the analysis that would be developed by a human morphologist.In the final section, we discuss the relationship of this style of MDL grammatical analysis to the notion of evaluation metric in early generative grammar.
TL;DR: Unsupervised learning of higher-order statistics provides support for Barlow's theory of visual recognition, which posits that detecting “suspicious coincidences” of elements during recognition is a necessary prerequisite for efficient learning of new visual features.
Abstract: Three experiments investigated the ability of human observers to extract the joint and conditional probabilities of shape co-occurrences during passive viewing of complex visual scenes. Results indicated that statistical learning of shape conjunctions was both rapid and automatic, as subjects were not instructed to attend to any particularfeatures of the displays. Moreover, in addition to single-shape frequency, subjects acquired in parallel several different higher-order aspects of the statistical structure of the displays, including absolute shape-position relations in an array, shape-pair arrangements independent of position, and conditional probabilities of shape co-occurrences. Unsupervised learning of these higher-order statistics provides support for Barlow's theory of visual recognition, which posits that detecting "suspicious coincidences" of elements during recognition is a necessary prerequisite for efficient learning of new visual features.
TL;DR: An approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999), and an insight into the interpretability of support vectors is given.
Abstract: Support vector machines (SVMs) have become a popular tool for machine learning with large amounts of high dimensional data. In this paper an approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999). An insight into the interpretability of support vectors is also given.
TL;DR: In this article, a statistical model for organizing image collections which integrates semantic information provided by associate text and image features is presented. But the model is not suitable for unsupervised learning for object recognition.
Abstract: We present a statistical model for organizing image collections which integrates semantic information provided by associate text and visual information provided by image features. The model is very promising for information retrieval tasks such as database browsing and searching for images based on text and/or image features. Furthermore, since the model learns relationships between text and image features, it can be used for novel applications such as associating words with pictures, and unsupervised learning for object recognition.
TL;DR: This paper introduces evolving fuzzy neural networks (EFuNNs) as a means for the implementation of the evolving connectionist systems (ECOS) paradigm that is aimed at building online, adaptive intelligent systems that have both their structure and functionality evolving in time.
Abstract: This paper introduces evolving fuzzy neural networks (EFuNNs) as a means for the implementation of the evolving connectionist systems (ECOS) paradigm that is aimed at building online, adaptive intelligent systems that have both their structure and functionality evolving in time. EFuNNs evolve their structure and parameter values through incremental, hybrid supervised/unsupervised, online learning. They can accommodate new input data, including new features, new classes, etc., through local element tuning. New connections and new neurons are created during the operation of the system. EFuNNs can learn spatial-temporal sequences in an adaptive way through one pass learning and automatically adapt their parameter values as they operate. Fuzzy or crisp rules can be inserted and extracted at any time of the EFuNN operation. The characteristics of EFuNNs are illustrated on several case study data sets for time series prediction and spoken word classification. Their performance is compared with traditional connectionist methods and systems. The applicability of EFuNNs as general purpose online learning machines, what concerns systems that learn from large databases, life-long learning systems, and online adaptive systems in different areas of engineering are discussed.
TL;DR: The present work describes how SOM can be used for the study of ecological communities, and how it can perfectly complete classical techniques for exploring data and for achieving community ordination.
TL;DR: This paper proposes a hierarchical reinforcement learning architecture that realizes practical learning speed in real hardware control tasks and applies it to a three-link, two-joint robot for the task of learning to stand up by trial and error.
TL;DR: A novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis is presented.
Abstract: This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter meth...
TL;DR: Experimental results show that active learning can substantially reduce the number of observations required to determine the structure of a domain.
Abstract: The task of causal structure discovery from empirical data is a fundamental problem in many areas. Experimental data is crucial for accomplishing this task. However, experiments are typically expensive, and must be selected with great care. This paper uses active learning to determine the experiments that are most informative towards uncovering the underlying structure. We formalize the causal learning task as that of learning the structure of a causal Bayesian network. We consider an active learner that is allowed to conduct experiments, where it intervenes in the domain by setting the values of certain variables. We provide a theoretical framework for the active learning problem, and an algorithm that actively chooses the experiments to perform based on the model learned so far. Experimental results show that active learning can substantially reduce the number of observations required to determine the structure of a domain.
TL;DR: This study examines the learning behavior of co-training on natural language processing tasks that typically require large numbers of training instances to achieve usable performance levels and proposes a moderately supervised variant of cotraining in which a human corrects the mistakes made during automatic labeling.
Abstract: Co-Training is a weakly supervised learning paradigm in which the redundancy of the learning task is captured by training two classifiers using separate views of the same data. This enables bootstrapping from a small set of labeled training data via a large set of unlabeled data. This study examines the learning behavior of co-training on natural language processing tasks that typically require large numbers of training instances to achieve usable performance levels. Using base noun phrase bracketing as a case study, we find that co-training reduces by 36% the difference in error between co-trained classifiers and fully supervised classifiers trained on a labeled version of all available data. However, degradation in the quality of the bootstrapped data arises as an obstacle to further improvement. To address this, we propose a moderately supervised variant of cotraining in which a human corrects the mistakes made during automatic labeling. Our analysis suggests that corrected co-training and similar moderately supervised methods may help cotraining scale to large natural language learning tasks.
TL;DR: In this paper, a customer self-service system and method for performing resource search and selection is presented, which includes steps of providing an interface enabling entry of a query for a resource and specification of one or more user context elements, each element representing a context associated with the current user state and having context attributes and attribute values associated therewith.
Abstract: A customer self service system and method for performing resource search and selection. The method includes steps of providing an interface enabling entry of a query for a resource and specification of one or more user context elements, each element representing a context associated with the current user state and having context attributes and attribute values associated therewith; enabling user specification of relevant resource selection criteria for enabling expression of relevance of resource results in terms of user context; searching a resource database and generating a resource response set having resources that best match a user's query, user context attributes and user defined relevant resource selection criteria; presenting said resource response set to the user in a manner whereby a relevance of each of the resources being expressed in terms of user context in a manner optimized to facilitate resource selection; and, enabling continued user selection and modification of context attribute values to enable increased specificity and accuracy of a user's query to thereby result in improved selection logic and attainment of resource response sets best fitted to the query. More particularly, adaptive algorithms and supervised and unsupervised learning sub-processes are implemented to enable the self service resource search and selection system to learn from each and all users and make that learning operationally benefit all users over time.
TL;DR: This work derives expectation-maximization algorithms for self-organizing maps with and without missing values from the link between vector quantization and mixture modeling and compares them with the elastic-net approach.
Abstract: Self-organizing maps are popular algorithms for unsupervised learning and data visualization. Exploiting the link between vector quantization and mixture modeling, we derive expectation-maximization (EM) algorithms for self-organizing maps with and without missing values. We compare self-organizing maps with the elastic-net approach and explain why the former is better suited for the visualization of high-dimensional data. Several extensions and improvements are discussed. As an illustration we apply a self-organizing map based on a multinomial distribution to market basket analysis.
TL;DR: In this paper, a method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps (SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents.
Abstract: A method of order-ranking document clusters using entropy data and Bayesian self-organizing feature maps(SOM) is provided in which an accuracy of information retrieval is improved by adopting Bayesian SOM for performing a real-time document clustering for relevant documents in accordance with a degree of semantic similarity between entropy data extracted using entropy value and user profiles and query words given by a user, wherein the Bayesian SOM is a combination of Bayesian statistical technique and Kohonen network that is a type of an unsupervised learning.
TL;DR: Numerical results show that the TNN is very effective in finding the optimal solutions of thresholding methods in an MSE sense and usually outperforms other noise reduction methods.
Abstract: In the paper, a type of thresholding neural network (TNN) is developed for adaptive noise reduction. New types of soft and hard thresholding functions are created to serve as the activation function of the TNN. Unlike the standard thresholding functions, the new thresholding functions are infinitely differentiable. By using the new thresholding functions, some gradient-based learning algorithms become possible or more effective. The optimal solution of the TNN in a mean square error (MSE) sense is discussed. It is proved that there is at most one optimal solution for the soft-thresholding TNN. General optimal performances of both soft and hard thresholding TNNs are analyzed and compared to the linear noise reduction method. Gradient-based adaptive learning algorithms are presented to seek the optimal solution for noise reduction. The algorithms include supervised and unsupervised batch learning as well as supervised and unsupervised stochastic learning. It is indicated that the TNN with the stochastic learning algorithms can be used as a novel nonlinear adaptive filter. It is proved that the stochastic learning algorithm is convergent in certain statistical sense in ideal conditions. Numerical results show that the TNN is very effective in finding the optimal solutions of thresholding methods in an MSE sense and usually outperforms other noise reduction methods. Especially, it is shown that the TNN-based nonlinear adaptive filtering outperforms the conventional linear adaptive filtering in both optimal solution and learning performance.
TL;DR: The paper illustrates how the learning rate affects training speed and generalization accuracy, and thus gives guidelines on how to efficiently select a learning rate that maximizes generalization accuracies.
Abstract: In gradient descent learning algorithms such as error backpropagation, the learning rate parameter can have a significant effect on generalization accuracy. In particular, decreasing the learning rate below that which yields the fastest convergence can significantly improve generalization accuracy, especially on large, complex problems. The learning rate also directly affects training speed, but not necessarily in the way that many people expect. Many neural network practitioners currently attempt to use the largest learning rate that still allows for convergence, in order to improve training speed. However, a learning rate that is too large can be as slow as a learning rate that is too small, and a learning rate that is too large or too small can require orders of magnitude more training time than one that is in an appropriate range. The paper illustrates how the learning rate affects training speed and generalization accuracy, and thus gives guidelines on how to efficiently select a learning rate that maximizes generalization accuracy.
TL;DR: A neural-fuzzy technology-based classifier for the recognition of power quality disturbances that adopts neural networks in the architecture of frequency-sensitive competitive leaning and learning vector quantization is presented.
Abstract: This paper presents a neural-fuzzy technology-based classifier for the recognition of power quality disturbances. The classifier adopts neural networks in the architecture of frequency sensitive competitive learning and learning vector quantization (LVQ). With given size of codewords, the neural networks are trained to determine the optimal decision boundaries separating different categories of disturbances. To cope with the uncertainties in the involved pattern recognition, the neural network outputs, instead of being taken as the final classification, are used to activate the fuzzy-associative-memory (FAM) recalling for identifying the most possible type that the input waveform may belong to. Furthermore, the input waveforms are preprocessed by the wavelet transform for feature extraction so as to improve the classifier with respect to recognition accuracy and scheme simplicity. Each subband of the transform coefficients is then utilized to recognize the associated disturbances.
TL;DR: In this paper, the authors presented extensions of k-nearest neighbors (k-NN), Citation-kNN, and the diverse density algorithm for the real-valued setting and study their performance on Boolean and realvalued data.
Abstract: The multiple-instance learning model has received much attention recently with a primary application area being that of drug activity prediction. Most prior work on multiple-instance learning has been for concept learning, yet for drug activity prediction, the label is a real-valued affinity measurement giving the binding strength. We present extensions of k-nearest neighbors (k-NN), Citation-kNN, and the diverse density algorithm for the real-valued setting and study their performance on Boolean and real-valued data. We also provide a method for generating chemically realistic artificial data.
TL;DR: It is argued that cerebellar motor learning is enhanced by a sparse code that simultaneously maximizes information transfer between mossy fibers and granule cells, minimizes redundancies between granule cell discharges, and re-codes the mossy fiber inputs with an adaptive resolution such that inputs corresponding to large errors are finely encoded.
TL;DR: A method of applying reinforcement learning, suitable for modeling and learning various kinds of interactions in real situations, to the problem of stock price prediction of the Korean stock market is proposed.
Abstract: Recently, numerous investigations for stock price prediction and portfolio management using machine learning have been trying to develop efficient mechanical trading systems. But these systems have a limitation in that they are mainly based on the supervised learning which is not so adequate for learning problems with long-term goals and delayed rewards. This paper proposes a method of applying reinforcement learning, suitable for modeling and learning various kinds of interactions in real situations, to the problem of stock price prediction. The stock price prediction problem is considered as Markov process which can be optimized by reinforcement learning based algorithm. TD(0), a reinforcement learning algorithm which learns only from experiences, is adopted and function approximation by an artificial neural network is performed to learn the values of states each of which corresponds to a stock price trend at a given time. An experimental result based on the Korean stock market is presented to evaluate the performance of the proposed method.
TL;DR: An algorithm is established to calculate the Bayesian stochastic complexity based on blowing-up technology in algebraic geometry and it is proved that theBayesian generalization error of a hierarchical learning machine is smaller than that of a regular statistical model, even if the true distribution is not contained in the parametric model.
TL;DR: It is shown that a simple mathematical model following these rules organizes its activity so as to maximize the difference between its responses and can adapt to changing environmental conditions in unsupervised fashion, in agreement with current neurophysiological data.
Abstract: Adult neurogenesis has long been documented in the vertebrate brain, and recently even in humans. Although it has been conjectured for many years that its functional role is related to the renewing of memories, no clear mechanism as to how this can be achieved has been proposed. We present a scheme in which incorporation of new neurons proceeds at a constant rate, while their survival is activity-dependent and thus contingent upon new neurons establishing suitable connections. We show that a simple mathematical model following these rules organizes its activity so as to maximize the difference between its responses, and can adapt to changing environmental conditions in unsupervised fashion.
TL;DR: A suite of unsupervised learning algorithms that induce the structure of lists by exploiting the regularities both in the format of the pages and the data contained in them are developed.
Abstract: We describe a technique for extracting data from lists and tables and grouping it by rows and columns. This is done completely automatically, using only some very general assumptions about the structure of the list. We have developed a suite of unsupervised learning algorithms that induce the structure of lists by exploiting the regularities both in the format of the pages and the data contained in them. Among the tools used are AutoClass for automatic classification of data and grammar induction of regular languages. The approach was tested on 14 Web sources providing diverse data types, and we found that for 10 of these sources we were able to correctly find lists and partition the data into columns and rows.
TL;DR: This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability, and can be applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of that corpus.
Abstract: refined and abstract meanings largely grow out of more concrete meanings. Bloomfield (1933)
This thesis introduces a new unsupervised learning framework, called Alignment-Based Learning, which is based on the alignment of sentences and Harris's (1951) notion of substitutability . Instances of the framework can be applied to an untagged, unstructured corpus of natural language sentences, resulting in a labelled, bracketed version of that corpus. Firstly, the framework aligns all sentences in the corpus in pairs, resulting in a partition of the sentences consisting of parts of the sentences that are equal in both sentences and parts that are unequal. Unequal parts of sen tences can be seen as being substitutable for each other, since substituting one unequal part for the other results in another valid sentence. The unequal parts of the sentences are thus considered to be possible (possibly overlapping) constituents, called hypotheses.
Secondly , the selection learning phase considers all hypotheses found by the alignment learning phase and selects the best of these. The hypotheses are selected based on the order in which they were found, or based on a probabilistic function. The framework can be extended with a grammar extraction phase. This extended framework is called parseABL. Instead of returning a structured version of the unstructured input corpus, like the ABL system, this system also returns a stochastic context-free or tree substitution grammar.
Different instances of the framework have been tested on the English ATIS corpus, the Dutch OVIS corpus and the Wall Street Journal corpus. One of the interesting results, apart from the encouraging numerical results, is that all instances can (and do) learn recursive structures.
TL;DR: An investigation has been made into the use of stochastic arithmetic to implement an artificial neural network solution to a typical pattern recognition application, with results indicating an order of magnitude improvement over the floating-point implementation assuming clock frequency parity.
Abstract: For pt. I see ibid., p.891-905. An investigation has been made into the use of stochastic arithmetic to implement an artificial neural network solution to a typical pattern recognition application. Optical character recognition is performed on very noisy characters in the E-13B MICR font. The artificial neural network is composed of two layers, the first layer being a set of soft competitive learning subnetworks and the second a set of fully connected linear output neurons. The observed number of clock cycles in the stochastic case represents an order of magnitude improvement over the floating-point implementation assuming clock frequency parity. Network generalization capabilities were also compared based on the network squared error as a function of the amount of noise added to the input patterns. The stochastic network maintains a squared error within 10 percent of that of the floating-point implementation for a wide range of noise levels.
TL;DR: This paper describes how genetic algorithms can be used to develop rough sets and the proposed rough set theoretic genetic encoding will be especially useful in unsupervised learning.
Abstract: The rough set is a useful notion for the classification of objects when the available information is not adequate to represent classes using precise sets Rough sets have been successfully used in information systems for learning rules from an expert This paper describes how genetic algorithms can be used to develop rough sets The proposed rough set theoretic genetic encoding will be especially useful in unsupervised learning A rough set genome consists of upper and lower bounds for sets in a partition The partition may be as simple as the conventional expert class and its complement or a more general classification scheme The paper provides a complete description of design and implementation of rough set genomes The proposed design and implementation is used to provide an unsupervised rough set classification of highway sections
TL;DR: A TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control are presented.
Abstract: Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.
TL;DR: The results show that the RBF NN can be considered a suitable technique for predicting flood flow as a linear combination of some nonlinear RBFs.
Abstract: A radial basis function (RBF) neural network (NN) is proposed to develop a rainfall-runoff model for three-hour-ahead flood forecasting. For faster training speed, the RBF NN employs a hybrid two-stage learning scheme. During the first stage, unsupervised learning, fuzzy min-max clustering is introduced to determine the characteristics of the nonlinear RBFs. In the second stage, supervised learning, multivariate linear regression is used to determine the weights between the hidden and output layers. The rainfall-runoff relation can be considered as a linear combination of some nonlinear RBFs. Rainfall and runoff events of the Lanyoung River collected during typhoons are used to train, validate,and test the network. The results show that the RBF NN can be considered a suitable technique for predicting flood flow.