TL;DR: A conceptual framework is presented that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation.
Abstract: In this paper, we describe the combination of machine learning and simulation towards a hybrid modelling approach. Such a combination of data-based and knowledge-based modelling is motivated by applications that are partly based on causal relationships, while other effects result from hidden dependencies that are represented in huge amounts of data. Our aim is to bridge the knowledge gap between the two individual communities from machine learning and simulation to promote the development of hybrid systems. We present a conceptual framework that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation. We also discuss an advanced pairing in the context of Industry 4.0 where we see particular further potential for hybrid systems.
TL;DR: In this paper, two general approaches for measuring the learner's aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting.
Abstract: Due to the steadily increasing relevance of machine learning for practical applications, many of which are coming with safety requirements, the notion of uncertainty has received increasing attention in machine learning research in the last couple of years. In particular, the idea of distinguishing between two important types of uncertainty, often refereed to as aleatoric and epistemic, has recently been studied in the setting of supervised learning. In this paper, we propose to quantify these uncertainties, referring, respectively, to inherent randomness and a lack of knowledge, with random forests. More specifically, we show how two general approaches for measuring the learner’s aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting. In this regard, we also compare random forests with deep neural networks, which have been used for a similar purpose.
TL;DR: In this paper, a comparative investigation of different features, including low-level and high-level features, for content-based image retrieval (CBIR) has been presented, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content.
Abstract: Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including low-level and high-level features, for CBIR. We compare the performance of CBIR systems using different deep features with state-of-the-art low-level features such as SIFT, SURF, HOG, LBP, and LTP, using different dictionaries and coefficient learning techniques. Furthermore, we conduct comparisons with a set of primitive and popular features that have been used in this field, including colour histograms and Gabor features. We also investigate the discriminative power of deep features using certain similarity measures under different validation approaches. Furthermore, we investigate the effects of the dimensionality reduction of deep features on the performance of CBIR systems using principal component analysis, discrete wavelet transform, and discrete cosine transform. Unprecedentedly, the experimental results demonstrate high (95\% and 93\%) mean average precisions when using the VGG-16 FC7 deep features of Corel-1000 and Coil-20 datasets with 10-D and 20-D K-SVD, respectively.
TL;DR: All neural networks, both Fourier and the standard one, empirically demonstrate lower approximation error than the truncated Fourier series when it comes to an approximation of a known function of multiple variables.
Abstract: We review neural network architectures which were motivated by Fourier series and integrals and which are referred to as Fourier neural networks. These networks are empirically evaluated in synthetic and real-world tasks. Neither of them outperforms the standard neural network with sigmoid activation function in the real-world tasks. All neural networks, both Fourier and the standard one, empirically demonstrate lower approximation error than the truncated Fourier series when it comes to an approximation of a known function of multiple variables.
TL;DR: In this paper, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner, and the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function.
Abstract: Clustering algorithm is the foundation and important technology in data mining. In fact, in the real world, the data itself often has a hierarchical structure. Hierarchical clustering aims at constructing a cluster tree, which reveals the underlying modal structure of a complex density. Due to its inherent complexity, most existing hierarchical clustering algorithms are usually designed heuristically without an explicit objective function, which limits its utilization and analysis. K-means clustering, the well-known simple yet effective algorithm which can be expressed from the view of probability distribution, has inherent connection to Mixture of Gaussians (MoG). At this point, we consider combining Bayesian theory analysis with K-means algorithm. This motivates us to develop a hierarchical clustering based on K-means under the probability distribution framework, which is different from existing hierarchical K-means algorithms processing data in a single-pass manner along with heuristic strategies. For this goal, we propose an explicit objective function for hierarchical clustering, termed as Bayesian hierarchical K-means (BHK-means). In our method, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner. In this cluster tree, the clustering results of each layer are influenced by the parent and child nodes. Therefore, the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function. The objective function is solved using the same algorithm as K-means, the Expectation-maximization algorithm. The experimental results on both synthetic data and benchmark datasets demonstrate the effectiveness of our algorithm over the existing related ones.
TL;DR: In this paper, a conditional convolutional autoencoder (CCAE) was used to generate handwritten digit proposals to improve the efficiency of input generation for the human while keeping the original input as similar as possible to the original inputs.
Abstract: Humans increasingly interact with Artificial intelligence (AI) systems. AI systems are optimized for objectives such as minimum computation or minimum error rate in recognizing and interpreting inputs from humans. In contrast, inputs created by humans are often treated as a given. We investigate how inputs of humans can be altered to reduce misinterpretation by the AI system and to improve efficiency of input generation for the human while altered inputs should remain as similar as possible to the original inputs. These objectives result in trade-offs that are analyzed for a deep learning system classifying handwritten digits. To create examples that serve as demonstrations for humans to improve, we develop a model based on a conditional convolutional autoencoder (CCAE). Our quantitative and qualitative evaluation shows that in many occasions the generated proposals lead to lower error rates, require less effort to create and differ only modestly from the original samples.
TL;DR: An MDL-based approach for selecting a characteristic subset of patterns on labeled graphs with the introduction of ports to encode connections between pattern occurrences without any loss of information is proposed.
Abstract: Many graph pattern mining algorithms have been designed to identify recurring structures in graphs. The main drawback of these approaches is that they often extract too many patterns for human analysis. Recently, pattern mining methods using the Minimum Description Length (MDL) principle have been proposed to select a characteristic subset of patterns from transactional, sequential and relational data. In this paper, we propose an MDL-based approach for selecting a characteristic subset of patterns on labeled graphs. A key notion in this paper is the introduction of ports to encode connections between pattern occurrences without any loss of information. Experiments show that the number of patterns is drastically reduced. The selected patterns have complex shapes and are representative of the data.
TL;DR: A new hybrid deep learning model is proposed to predict and target vaccination rates in the less immunized regions in India using the data collected from the recently updated District Level Household Survey-4.
Abstract: There has been a major and rising interest in India for increasing vaccination rate among peoples to make the nation healthier and safer. In this paper, a new hybrid deep learning model is proposed to predict and target vaccination rates in the less immunized regions. The Rank-Based Multi-Layer Perceptron (R-MLP) hybrid deep learning framework uses the data collected from the recently updated District Level Household Survey-4 (DLHS). R-MLP model predicts and categorizes the percentage of partly immunized vaccination rates as extreme, low and medium ranges. This predicted findings are cross-verified by Deep Soft Cosine Semantic and Ranking SVM based model (DSS-RSM). DSS-RSM model uses the data obtained from the medical practitioners through a location-based social network. The proposed model predicts and extracts patterns with high similarity frequency for identifying vulnerable low immunization regions. It classifies the predicted patterns into two classes such as Class 1 is denoted as high ranked regions and Class 2 is denoted as low ranked regions based on the percentage of pattern matches. Finally, the results from R-MLP and DSS-RSM models are cross-linked together using ensemble model. This model finds the loss values to identify the target regions were health care program need to be conducted for increasing the level of immunization among children’s. The proposed hybrid deep learning models trains and validates using python-based Keras and TensorFlow deep learning libraries. The performance of the proposed hybrid deep learning model is compared with other variant machine learning techniques such as Decision Tree C5.0, Naive Bayes and Linear Regression. This comparative results are evaluated using evaluation measures such as Precision, Recall, Accuracy and F1-Measure. Our results show that the hybrid deep learning system is clearly superior to any other alternative approach.
TL;DR: The computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction, and faster processing speeds and similar recognition rates on MORPH-II.
Abstract: Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction.
TL;DR: An efficient hybrid algorithm with the strategy of two-stage searches for Bayesian network structure learning can obtain better performance of BN structure learning.
Abstract: It is important for Bayesian network (BN) structure learning, a NP-problem, to improve the accuracy and hybrid algorithms are a kind of effective structure learning algorithms at present. Most hybrid algorithms adopt the strategy of one heuristic search and can be divided into two groups: one heuristic search based on initial BN skeleton and one heuristic search based on initial solutions. The former often fails to guarantee globality of the optimal structure and the latter fails to get the optimal solution because of large search space. In this paper, an efficient hybrid algorithm is proposed with the strategy of two-stage searches. For first-stage search, it firstly determines the local search space based on Maximal Information Coefficient by introducing penalty factors p1, p2, then searches the local space by Binary Particle Swarm Optimization. For second-stage search, an efficient ADR (the abbreviation of Add, Delete, Reverse) algorithm based on three basic operators is designed to extend the local space to the whole space. Experiment results show that the proposed algorithm can obtain better performance of BN structure learning.
TL;DR: In this article, the authors proposed a method to evaluate the validity of ML pipelines using a surrogate model (AVATAR), which enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines.
Abstract: The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.
TL;DR: A technique is composed that allows to hide adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer with regards to human visual perception.
Abstract: Convolutional neural networks have been used to achieve a string of successes during recent years, but their lack of interpretability remains a serious issue. Adversarial examples are designed to deliberately fool neural networks into making any desired incorrect classification, potentially with very high certainty. Several defensive approaches increase robustness against adversarial attacks, demanding attacks of greater magnitude, which lead to visible artifacts. By considering human visual perception, we compose a technique that allows to hide such adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer. We carry out a user study on classifying adversarially modified images to validate the perceptual quality of our approach and find significant evidence for its concealment with regards to human visual perception.
TL;DR: This paper proposes an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability, and consists of a novel Meta-Learning system able to predict the values of theData complexity measures for a dataset by using simpler meta-features as input.
Abstract: Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced.
TL;DR: A batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification.
Abstract: Learning from potentially infinite and high-dimensional data streams poses significant challenges in the classification task. For instance, k-Nearest Neighbors (kNN) is one of the most often used algorithms in the data stream mining area that proved to be very resource-intensive when dealing with high-dimensional spaces. Uniform Manifold Approximation and Projection (UMAP) is a novel manifold technique and one of the most promising dimension reduction and visualization techniques in the non-streaming setting because of its high performance in comparison with competitors. However, there is no version of UMAP that copes with the challenging context of streams. To overcome these restrictions, we propose a batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification. Experiments conducted on publicly available synthetic and real-world datasets demonstrate the substantial gains that can be achieved with our proposal compared to state-of-the-art techniques.
TL;DR: This work proposes three algorithms that take a supervised learning model and make it perform more monotone, and proves consistency and monotonicity with high probability, and evaluates the algorithms on scenarios where non-monotone behaviour occurs.
Abstract: Learning performance can show non-monotonic behavior. That is, more data does not necessarily lead to better models, even on average. We propose three algorithms that take a supervised learning model and make it perform more monotone. We prove consistency and monotonicity with high probability, and evaluate the algorithms on scenarios where non-monotone behaviour occurs. Our proposed algorithm \(\text {MT}_{\text {HT}}\) makes less than \(1\%\) non-monotone decisions on MNIST while staying competitive in terms of error rate compared to several baselines. Our code is available at https://github.com/tomviering/monotone.
TL;DR: A new method is proposed that allows clusters to overlap until a strong cluster attraction is reached, based on a density criterion, and the resulting hierarchical structure is represented as a directed acyclic graph and combines the advantages of hierarchies with the precision of a less arbitrary clustering.
Abstract: Agglomerative clustering methods have been widely used by many research communities to cluster their data into hierarchical structures. These structures ease data exploration and are understandable even for non-specialists. But these methods necessarily result in a tree, since, at each agglomeration step, two clusters have to be merged. This may bias the data analysis process if, for example, a cluster is almost equally attracted by two others. In this paper we propose a new method that allows clusters to overlap until a strong cluster attraction is reached, based on a density criterion. The resulting hierarchical structure, called a quasi-dendrogram, is represented as a directed acyclic graph and combines the advantages of hierarchies with the precision of a less arbitrary clustering. We validate our work with extensive experiments on real data sets and compare it with existing tree-based methods, using a new measure of similarity between heterogeneous hierarchical structures.
TL;DR: In this article, the authors propose a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies.
Abstract: Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a Big Data scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multi-hypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive.