Top 146 papers presented at Intelligent Data Analysis in 2020

Showing papers presented at "Intelligent Data Analysis in 2020"

Book Chapter•10.1007/978-3-030-44584-3_43•

Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions.

[...]

Laura von Rueden, Sebastian Mayer, Rafet Sifa, Christian Bauckhage, Jochen Garcke - Show less +1 more

27 Apr 2020

TL;DR: A conceptual framework is presented that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation.

...read moreread less

Abstract: In this paper, we describe the combination of machine learning and simulation towards a hybrid modelling approach. Such a combination of data-based and knowledge-based modelling is motivated by applications that are partly based on causal relationships, while other effects result from hidden dependencies that are represented in huge amounts of data. Our aim is to bridge the knowledge gap between the two individual communities from machine learning and simulation to promote the development of hybrid systems. We present a conceptual framework that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation. We also discuss an advanced pairing in the context of Industry 4.0 where we see particular further potential for hybrid systems.

...read moreread less

137 citations

Book Chapter•10.1007/978-3-030-44584-3_35•

Aleatoric and Epistemic Uncertainty with Random Forests

[...]

Mohammad Hossein Shaker¹, Eyke Hüllermeier¹•Institutions (1)

University of Paderborn¹

27 Apr 2020

TL;DR: In this paper, two general approaches for measuring the learner's aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting.

...read moreread less

Abstract: Due to the steadily increasing relevance of machine learning for practical applications, many of which are coming with safety requirements, the notion of uncertainty has received increasing attention in machine learning research in the last couple of years. In particular, the idea of distinguishing between two important types of uncertainty, often refereed to as aleatoric and epistemic, has recently been studied in the setting of supervised learning. In this paper, we propose to quantify these uncertainties, referring, respectively, to inherent randomness and a lack of knowledge, with random forests. More specifically, we show how two general approaches for measuring the learner’s aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting. In this regard, we also compare random forests with deep neural networks, which have been used for a similar purpose.

...read moreread less

78 citations

Journal Article•10.3233/IDA-194641•

An efficient and robust bat algorithm with fusion of opposition-based learning and whale optimization algorithm

[...]

Jinkun Luo, Fazhi He, Jiashi Yong

1 Jan 2020

60 citations

Journal Article•10.3233/IDA-194485•

An improved opposition based learning firefly algorithm with dragonfly algorithm for solving continuous optimization problems

[...]

Mehdi Abedi, Farhad Soleimanian Gharehchopogh¹•Institutions (1)

Islamic Azad University¹

1 Jan 2020

45 citations

Journal Article•10.3233/IDA-184411•

Detailed Investigation of Deep Features with Sparse Representation and Dimensionality Reduction in CBIR: A Comparative Study

[...]

Ahmad S. Tarawneh¹, Ceyhun Celik², Ahmad B. A. Hassanat³, Ahmad B. A. Hassanat⁴, Dmitry Chetverikov¹ - Show less +1 more•Institutions (4)

Eötvös Loránd University¹, Gazi University², University of Tabuk³, Mutah University⁴

1 Jan 2020

TL;DR: In this paper, a comparative investigation of different features, including low-level and high-level features, for content-based image retrieval (CBIR) has been presented, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content.

...read moreread less

Abstract: Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including low-level and high-level features, for CBIR. We compare the performance of CBIR systems using different deep features with state-of-the-art low-level features such as SIFT, SURF, HOG, LBP, and LTP, using different dictionaries and coefficient learning techniques. Furthermore, we conduct comparisons with a set of primitive and popular features that have been used in this field, including colour histograms and Gabor features. We also investigate the discriminative power of deep features using certain similarity measures under different validation approaches. Furthermore, we investigate the effects of the dimensionality reduction of deep features on the performance of CBIR systems using principal component analysis, discrete wavelet transform, and discrete cosine transform. Unprecedentedly, the experimental results demonstrate high (95\% and 93\%) mean average precisions when using the VGG-16 FC7 deep features of Corel-1000 and Coil-20 datasets with 10-D and 20-D K-SVD, respectively.

...read moreread less

35 citations

Journal Article•10.3233/IDA-194487•

An intrusion detection method based on active transfer learning

[...]

Jingmei Li¹, Weifei Wu¹, Di Xue¹•Institutions (1)

Harbin Engineering University¹

1 Jan 2020

24 citations

Journal Article•10.3233/IDA-194509•

Efficient heuristics for learning Bayesian network from labeled and unlabeled data

[...]

Zhiyi Duan¹, Limin Wang¹, Minghui Sun¹•Institutions (1)

Jilin University¹

1 Jan 2020

23 citations

Other•10.1002/9781119544487.CH17•

Bruxism Detection Using Single‐Channel C4‐A1 on Human Sleep S2 Stage Recording

[...]

Belal Bin Heyat¹, Dakun Lai¹, Faijan Akhtar¹, Mohd Ammar Bin Hayat, Shafan Azad, Shadab Azad, Shajan Azad - Show less +3 more•Institutions (1)

University of Electronic Science and Technology of China¹

29 Jun 2020

19 citations

Journal Article•10.3233/IDA-195050•

Fourier neural networks: A comparative study

[...]

Abylay Zhumekenov¹, Malika Uteuliyeva¹, Olzhas Kabdolov², Rustem Takhanov¹, Zhenisbek Assylbekov¹, Alejandro J. Castro¹ - Show less +2 more•Institutions (2)

Nazarbayev University¹, Huawei²

1 Jan 2020

TL;DR: All neural networks, both Fourier and the standard one, empirically demonstrate lower approximation error than the truncated Fourier series when it comes to an approximation of a known function of multiple variables.

...read moreread less

Abstract: We review neural network architectures which were motivated by Fourier series and integrals and which are referred to as Fourier neural networks. These networks are empirically evaluated in synthetic and real-world tasks. Neither of them outperforms the standard neural network with sigmoid activation function in the real-world tasks. All neural networks, both Fourier and the standard one, empirically demonstrate lower approximation error than the truncated Fourier series when it comes to an approximation of a known function of multiple variables.

...read moreread less

18 citations

Journal Article•10.3233/IDA-194807•

Bayesian hierarchical K-means clustering

[...]

Yue Liu¹, Bufang Li•Institutions (1)

Hebei University¹

1 Jan 2020

TL;DR: In this paper, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner, and the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function.

...read moreread less

Abstract: Clustering algorithm is the foundation and important technology in data mining. In fact, in the real world, the data itself often has a hierarchical structure. Hierarchical clustering aims at constructing a cluster tree, which reveals the underlying modal structure of a complex density. Due to its inherent complexity, most existing hierarchical clustering algorithms are usually designed heuristically without an explicit objective function, which limits its utilization and analysis. K-means clustering, the well-known simple yet effective algorithm which can be expressed from the view of probability distribution, has inherent connection to Mixture of Gaussians (MoG). At this point, we consider combining Bayesian theory analysis with K-means algorithm. This motivates us to develop a hierarchical clustering based on K-means under the probability distribution framework, which is different from existing hierarchical K-means algorithms processing data in a single-pass manner along with heuristic strategies. For this goal, we propose an explicit objective function for hierarchical clustering, termed as Bayesian hierarchical K-means (BHK-means). In our method, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner. In this cluster tree, the clustering results of each layer are influenced by the parent and child nodes. Therefore, the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function. The objective function is solved using the same algorithm as K-means, the Expectation-maximization algorithm. The experimental results on both synthetic data and benchmark datasets demonstrate the effectiveness of our algorithm over the existing related ones.

...read moreread less

17 citations

Journal Article•10.3233/IDA-194653•

Multi-fuzzy-constrained graph pattern matching with big graph data.

[...]

Guliu Liu, Lei Li¹, Xindong Wu¹•Institutions (1)

Hefei University of Technology¹

1 Jan 2020

Book Chapter•10.1007/978-3-030-44584-3_34•

Human-to-AI Coach: Improving Human Inputs to AI Systems.

[...]

Johannes Schneider¹•Institutions (1)

University of Liechtenstein¹

27 Apr 2020

TL;DR: In this paper, a conditional convolutional autoencoder (CCAE) was used to generate handwritten digit proposals to improve the efficiency of input generation for the human while keeping the original input as similar as possible to the original inputs.

...read moreread less

Abstract: Humans increasingly interact with Artificial intelligence (AI) systems. AI systems are optimized for objectives such as minimum computation or minimum error rate in recognizing and interpreting inputs from humans. In contrast, inputs created by humans are often treated as a given. We investigate how inputs of humans can be altered to reduce misinterpretation by the AI system and to improve efficiency of input generation for the human while altered inputs should remain as similar as possible to the original inputs. These objectives result in trade-offs that are analyzed for a deep learning system classifying handwritten digits. To create examples that serve as demonstrations for humans to improve, we develop a model based on a conditional convolutional autoencoder (CCAE). Our quantitative and qualitative evaluation shows that in many occasions the generated proposals lead to lower error rates, require less effort to create and differ only modestly from the original samples.

...read moreread less

Book Chapter•10.1007/978-3-030-44584-3_5•

GraphMDL: Graph Pattern Selection based on Minimum Description Length

[...]

Francesco Bariatti¹, Peggy Cellier¹, Sébastien Ferré¹•Institutions (1)

University of Rennes¹

27 Apr 2020

TL;DR: An MDL-based approach for selecting a characteristic subset of patterns on labeled graphs with the introduction of ports to encode connections between pattern occurrences without any loss of information is proposed.

...read moreread less

Abstract: Many graph pattern mining algorithms have been designed to identify recurring structures in graphs. The main drawback of these approaches is that they often extract too many patterns for human analysis. Recently, pattern mining methods using the Minimum Description Length (MDL) principle have been proposed to select a characteristic subset of patterns from transactional, sequential and relational data. In this paper, we propose an MDL-based approach for selecting a characteristic subset of patterns on labeled graphs. A key notion in this paper is the introduction of ports to encode connections between pattern occurrences without any loss of information. Experiments show that the number of patterns is drastically reduced. The selected patterns have complex shapes and are representative of the data.

...read moreread less

Journal Article•10.3233/IDA-194477•

Balanced training/test set sampling for proper evaluation of classification models

[...]

Donghoon Kang¹, Sejong Oh¹•Institutions (1)

Dankook University¹

1 Jan 2020

Journal Article•10.3233/IDA-194820•

A hybrid deep learning model for predicting and targeting the less immunized area to improve childrens vaccination rate

[...]

G. Mohanraj¹, V. Mohanraj¹, J. Senthilkumar¹, Y. Suresh¹•Institutions (1)

Sona College of Technology¹

1 Jan 2020

TL;DR: A new hybrid deep learning model is proposed to predict and target vaccination rates in the less immunized regions in India using the data collected from the recently updated District Level Household Survey-4.

...read moreread less

Abstract: There has been a major and rising interest in India for increasing vaccination rate among peoples to make the nation healthier and safer. In this paper, a new hybrid deep learning model is proposed to predict and target vaccination rates in the less immunized regions. The Rank-Based Multi-Layer Perceptron (R-MLP) hybrid deep learning framework uses the data collected from the recently updated District Level Household Survey-4 (DLHS). R-MLP model predicts and categorizes the percentage of partly immunized vaccination rates as extreme, low and medium ranges. This predicted findings are cross-verified by Deep Soft Cosine Semantic and Ranking SVM based model (DSS-RSM). DSS-RSM model uses the data obtained from the medical practitioners through a location-based social network. The proposed model predicts and extracts patterns with high similarity frequency for identifying vulnerable low immunization regions. It classifies the predicted patterns into two classes such as Class 1 is denoted as high ranked regions and Class 2 is denoted as low ranked regions based on the percentage of pattern matches. Finally, the results from R-MLP and DSS-RSM models are cross-linked together using ensemble model. This model finds the loss values to identify the target regions were health care program need to be conducted for increasing the level of immunization among children’s. The proposed hybrid deep learning models trains and validates using python-based Keras and TensorFlow deep learning libraries. The performance of the proposed hybrid deep learning model is compared with other variant machine learning techniques such as Decision Tree C5.0, Naive Bayes and Linear Regression. This comparative results are evaluated using evaluation measures such as Precision, Recall, Accuracy and F1-Measure. Our results show that the hybrid deep learning system is clearly superior to any other alternative approach.

...read moreread less

Journal Article•10.3233/IDA-194486•

A comparison study on nonlinear dimension reduction methods with kernel variations: Visualization, optimization and classification

[...]

Katherine C. Kempfert¹, Yishi Wang², Cuixian Chen², Samuel W. K. Wong³•Institutions (3)

University of Florida¹, University of North Carolina at Wilmington², University of Waterloo³

1 Jan 2020

TL;DR: The computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction, and faster processing speeds and similar recognition rates on MORPH-II.

...read moreread less

Abstract: Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction.

...read moreread less

Journal Article•10.3233/IDA-194844•

An efficient Bayesian network structure learning algorithm using the strategy of two-stage searches

[...]

Huiping Guo, Hongru Li

1 Jan 2020

TL;DR: An efficient hybrid algorithm with the strategy of two-stage searches for Bayesian network structure learning can obtain better performance of BN structure learning.

...read moreread less

Abstract: It is important for Bayesian network (BN) structure learning, a NP-problem, to improve the accuracy and hybrid algorithms are a kind of effective structure learning algorithms at present. Most hybrid algorithms adopt the strategy of one heuristic search and can be divided into two groups: one heuristic search based on initial BN skeleton and one heuristic search based on initial solutions. The former often fails to guarantee globality of the optimal structure and the latter fails to get the optimal solution because of large search space. In this paper, an efficient hybrid algorithm is proposed with the strategy of two-stage searches. For first-stage search, it firstly determines the local search space based on Maximal Information Coefficient by introducing penalty factors p1, p2, then searches the local space by Binary Particle Swarm Optimization. For second-stage search, an efficient ADR (the abbreviation of Add, Delete, Reverse) algorithm based on three basic operators is designed to extend the local space to the whole space. Experiment results show that the proposed algorithm can obtain better performance of BN structure learning.

...read moreread less

Book Chapter•10.1007/978-3-030-44584-3_28•

AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

[...]

Tien-Dung Nguyen¹, Tomasz Maszczyk¹, Katarzyna Musial¹, Marc-André Zöller², Bogdan Gabrys¹ - Show less +1 more•Institutions (2)

University of Technology, Sydney¹, Software AG²

27 Apr 2020

TL;DR: In this article, the authors proposed a method to evaluate the validity of ML pipelines using a surrogate model (AVATAR), which enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines.

...read moreread less

Abstract: The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.

...read moreread less

Journal Article•10.3233/IDA-184415•

A study on rare fraud predictions with big Medicare claims fraud data

[...]

Richard A. Bauder¹, Taghi M. Khoshgoftaar¹•Institutions (1)

Florida Atlantic University¹

1 Jan 2020

Book Chapter•10.1007/978-3-030-44584-3_19•

Adversarial Attacks Hidden in Plain Sight

[...]

Jan Philip Göpfert¹, André Artelt¹, Heiko Wersing², Barbara Hammer¹•Institutions (2)

Bielefeld University¹, Honda²

27 Apr 2020

TL;DR: A technique is composed that allows to hide adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer with regards to human visual perception.

...read moreread less

Abstract: Convolutional neural networks have been used to achieve a string of successes during recent years, but their lack of interpretability remains a serious issue. Adversarial examples are designed to deliberately fool neural networks into making any desired incorrect classification, potentially with very high certainty. Several defensive approaches increase robustness against adversarial attacks, demanding attacks of greater magnitude, which lead to visible artifacts. By considering human visual perception, we compose a technique that allows to hide such adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer. We carry out a user study on classifying adversarially modified images to validate the perceptual quality of our approach and find significant evidence for its concealment with regards to human visual perception.

...read moreread less

Journal Article•10.3233/IDA-194656•

An outlier ensemble for unsupervised anomaly detection in honeypots data

[...]

Lynda Boukela¹, Gongxuan Zhang¹, Samia Bouzefrane, Junlong Zhou¹•Institutions (1)

Nanjing University of Science and Technology¹

1 Jan 2020

Journal Article•10.3233/IDA-184403•

DNN models based on dimensionality reduction for stock trading

[...]

Dongdong Lv¹, Dong Wang², Meizi Li³, Yang Xiang¹•Institutions (3)

Tongji University¹, Shanghai Institute of Technology², Shanghai Normal University³

1 Jan 2020

Other•10.1002/9781119544487.CH1•

Intelligent Data Analysis

[...]

Sarthak Gupta¹, Siddhant Bagga¹, Deepak Kumar Sharma¹•Institutions (1)

Netaji Subhas Institute of Technology¹

2 Jun 2020

Journal Article•10.3233/IDA-194574•

An improvement of SAX representation for time series by using complexity invariance

[...]

Xuan-May Thi Le, Tuan Minh Tran, Hien T. Nguyen¹•Institutions (1)

Banking University of Ho Chi Minh City¹

1 Jan 2020

Journal Article•10.3233/IDA-194803•

Boosting meta-learning with simulated data complexity measures

[...]

Luís P. F. Garcia¹, Adriano Rivolli, Edesio Alcoba², Ana Carolina Lorena, André C. P. L. F. de Carvalho² - Show less +1 more•Institutions (2)

University of Brasília¹, University of São Paulo²

1 Jan 2020

TL;DR: This paper proposes an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability, and consists of a novel Meta-Learning system able to predict the values of theData complexity measures for a dataset by using simpler meta-features as input.

...read moreread less

Abstract: Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced.

...read moreread less

Journal Article•10.3233/IDA-194961•

Hybrid recommendation model based on deep learning and Stacking integration strategy

[...]

Xiaolan Xie, Shantian Pang, Jili Chen

1 Jan 2020

Book Chapter•10.1007/978-3-030-44584-3_4•

Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams

[...]

Maroua Bahri¹, Maroua Bahri², Bernhard Pfahringer³, Albert Bifet³, Albert Bifet¹, Silviu Maniu⁴, Silviu Maniu² - Show less +3 more•Institutions (4)

Télécom ParisTech¹, Université Paris-Saclay², University of Waikato³, École Normale Supérieure⁴

27 Apr 2020

TL;DR: A batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification.

...read moreread less

Abstract: Learning from potentially infinite and high-dimensional data streams poses significant challenges in the classification task. For instance, k-Nearest Neighbors (kNN) is one of the most often used algorithms in the data stream mining area that proved to be very resource-intensive when dealing with high-dimensional spaces. Uniform Manifold Approximation and Projection (UMAP) is a novel manifold technique and one of the most promising dimension reduction and visualization techniques in the non-streaming setting because of its high performance in comparison with competitors. However, there is no version of UMAP that copes with the challenging context of streams. To overcome these restrictions, we propose a batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification. Experiments conducted on publicly available synthetic and real-world datasets demonstrate the substantial gains that can be achieved with our proposal compared to state-of-the-art techniques.

...read moreread less

Book Chapter•10.1007/978-3-030-44584-3_42•

Making Learners (More) Monotone

[...]

Tom J. Viering¹, Alexander Mey¹, Marco Loog¹, Marco Loog²•Institutions (2)

Delft University of Technology¹, University of Copenhagen²

27 Apr 2020

TL;DR: This work proposes three algorithms that take a supervised learning model and make it perform more monotone, and proves consistency and monotonicity with high probability, and evaluates the algorithms on scenarios where non-monotone behaviour occurs.

...read moreread less

Abstract: Learning performance can show non-monotonic behavior. That is, more data does not necessarily lead to better models, even on average. We propose three algorithms that take a supervised learning model and make it perform more monotone. We prove consistency and monotonicity with high probability, and evaluate the algorithms on scenarios where non-monotone behaviour occurs. Our proposed algorithm \(\text {MT}_{\text {HT}}\) makes less than \(1\%\) non-monotone decisions on MNIST while staying competitive in terms of error rate compared to several baselines. Our code is available at https://github.com/tomviering/monotone.

...read moreread less

Book Chapter•10.1007/978-3-030-44584-3_21•

Overlapping Hierarchical Clustering (OHC)

[...]

Ian Jeantet¹, Zoltán Miklós¹, David Gross-Amblard¹•Institutions (1)

University of Rennes¹

27 Apr 2020

TL;DR: A new method is proposed that allows clusters to overlap until a strong cluster attraction is reached, based on a density criterion, and the resulting hierarchical structure is represented as a directed acyclic graph and combines the advantages of hierarchies with the precision of a less arbitrary clustering.

...read moreread less

Abstract: Agglomerative clustering methods have been widely used by many research communities to cluster their data into hierarchical structures. These structures ease data exploration and are understandable even for non-specialists. But these methods necessarily result in a tree, since, at each agglomeration step, two clusters have to be merged. This may bias the data analysis process if, for example, a cluster is almost equally attracted by two others. In this paper we propose a new method that allows clusters to overlap until a strong cluster attraction is reached, based on a density criterion. The resulting hierarchical structure, called a quasi-dendrogram, is represented as a directed acyclic graph and combines the advantages of hierarchies with the precision of a less arbitrary clustering. We validate our work with extensive experiments on real data sets and compare it with existing tree-based methods, using a new measure of similarity between heterogeneous hierarchical structures.

...read moreread less

Journal Article•10.3233/IDA-194576•

Online Analytical Processsing on Graph Data

[...]

Leticia I. Gómez¹, Bart Kuijpers², Alejandro A. Vaisman¹•Institutions (2)

Instituto Tecnológico de Buenos Aires¹, University of Hasselt²

1 Jan 2020

TL;DR: In this article, the authors propose a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies.

...read moreread less

Abstract: Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a Big Data scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multi-hypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive.

...read moreread less

...

Expand