Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Intelligent Data Analysis
  4. 2020
  1. Home
  2. Conferences
  3. Intelligent Data Analysis
  4. 2020
Showing papers presented at "Intelligent Data Analysis in 2020"
Book Chapter•10.1007/978-3-030-44584-3_43•
Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions.

[...]

Laura von Rueden, Sebastian Mayer, Rafet Sifa, Christian Bauckhage, Jochen Garcke 
27 Apr 2020
TL;DR: A conceptual framework is presented that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation.
Abstract: In this paper, we describe the combination of machine learning and simulation towards a hybrid modelling approach. Such a combination of data-based and knowledge-based modelling is motivated by applications that are partly based on causal relationships, while other effects result from hidden dependencies that are represented in huge amounts of data. Our aim is to bridge the knowledge gap between the two individual communities from machine learning and simulation to promote the development of hybrid systems. We present a conceptual framework that helps to identify potential combined approaches and employ it to give a structured overview of different types of combinations using exemplary approaches of simulation-assisted machine learning and machine-learning assisted simulation. We also discuss an advanced pairing in the context of Industry 4.0 where we see particular further potential for hybrid systems.

137 citations

Book Chapter•10.1007/978-3-030-44584-3_35•
Aleatoric and Epistemic Uncertainty with Random Forests

[...]

Mohammad Hossein Shaker1, Eyke Hüllermeier1•
University of Paderborn1
27 Apr 2020
TL;DR: In this paper, two general approaches for measuring the learner's aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting.
Abstract: Due to the steadily increasing relevance of machine learning for practical applications, many of which are coming with safety requirements, the notion of uncertainty has received increasing attention in machine learning research in the last couple of years. In particular, the idea of distinguishing between two important types of uncertainty, often refereed to as aleatoric and epistemic, has recently been studied in the setting of supervised learning. In this paper, we propose to quantify these uncertainties, referring, respectively, to inherent randomness and a lack of knowledge, with random forests. More specifically, we show how two general approaches for measuring the learner’s aleatoric and epistemic uncertainty in a prediction can be instantiated with decision trees and random forests as learning algorithms in a classification setting. In this regard, we also compare random forests with deep neural networks, which have been used for a similar purpose.

78 citations

Journal Article•10.3233/IDA-194641•
An efficient and robust bat algorithm with fusion of opposition-based learning and whale optimization algorithm

[...]

Jinkun Luo, Fazhi He, Jiashi Yong
1 Jan 2020

60 citations

Journal Article•10.3233/IDA-194485•
An improved opposition based learning firefly algorithm with dragonfly algorithm for solving continuous optimization problems

[...]

Mehdi Abedi, Farhad Soleimanian Gharehchopogh1•
Islamic Azad University1
1 Jan 2020

45 citations

Journal Article•10.3233/IDA-184411•
Detailed Investigation of Deep Features with Sparse Representation and Dimensionality Reduction in CBIR: A Comparative Study

[...]

Ahmad S. Tarawneh1, Ceyhun Celik2, Ahmad B. A. Hassanat3, Ahmad B. A. Hassanat4, Dmitry Chetverikov1 •
Eötvös Loránd University1, Gazi University2, University of Tabuk3, Mutah University4
1 Jan 2020
TL;DR: In this paper, a comparative investigation of different features, including low-level and high-level features, for content-based image retrieval (CBIR) has been presented, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content.
Abstract: Research on content-based image retrieval (CBIR) has been under development for decades, and numerous methods have been competing to extract the most discriminative features for improved representation of the image content. Recently, deep learning methods have gained attention in computer vision, including CBIR. In this paper, we present a comparative investigation of different features, including low-level and high-level features, for CBIR. We compare the performance of CBIR systems using different deep features with state-of-the-art low-level features such as SIFT, SURF, HOG, LBP, and LTP, using different dictionaries and coefficient learning techniques. Furthermore, we conduct comparisons with a set of primitive and popular features that have been used in this field, including colour histograms and Gabor features. We also investigate the discriminative power of deep features using certain similarity measures under different validation approaches. Furthermore, we investigate the effects of the dimensionality reduction of deep features on the performance of CBIR systems using principal component analysis, discrete wavelet transform, and discrete cosine transform. Unprecedentedly, the experimental results demonstrate high (95\% and 93\%) mean average precisions when using the VGG-16 FC7 deep features of Corel-1000 and Coil-20 datasets with 10-D and 20-D K-SVD, respectively.

35 citations

Journal Article•10.3233/IDA-194487•
An intrusion detection method based on active transfer learning

[...]

Jingmei Li1, Weifei Wu1, Di Xue1•
Harbin Engineering University1
1 Jan 2020

24 citations

Journal Article•10.3233/IDA-194509•
Efficient heuristics for learning Bayesian network from labeled and unlabeled data

[...]

Zhiyi Duan1, Limin Wang1, Minghui Sun1•
Jilin University1
1 Jan 2020

23 citations

Other•10.1002/9781119544487.CH17•
Bruxism Detection Using Single‐Channel C4‐A1 on Human Sleep S2 Stage Recording

[...]

Belal Bin Heyat1, Dakun Lai1, Faijan Akhtar1, Mohd Ammar Bin Hayat, Shafan Azad, Shadab Azad, Shajan Azad •
University of Electronic Science and Technology of China1
29 Jun 2020

19 citations

Journal Article•10.3233/IDA-195050•
Fourier neural networks: A comparative study

[...]

Abylay Zhumekenov1, Malika Uteuliyeva1, Olzhas Kabdolov2, Rustem Takhanov1, Zhenisbek Assylbekov1, Alejandro J. Castro1 •
Nazarbayev University1, Huawei2
1 Jan 2020
TL;DR: All neural networks, both Fourier and the standard one, empirically demonstrate lower approximation error than the truncated Fourier series when it comes to an approximation of a known function of multiple variables.
Abstract: We review neural network architectures which were motivated by Fourier series and integrals and which are referred to as Fourier neural networks. These networks are empirically evaluated in synthetic and real-world tasks. Neither of them outperforms the standard neural network with sigmoid activation function in the real-world tasks. All neural networks, both Fourier and the standard one, empirically demonstrate lower approximation error than the truncated Fourier series when it comes to an approximation of a known function of multiple variables.

18 citations

Journal Article•10.3233/IDA-194807•
Bayesian hierarchical K-means clustering

[...]

Yue Liu1, Bufang Li•
Hebei University1
1 Jan 2020
TL;DR: In this paper, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner, and the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function.
Abstract: Clustering algorithm is the foundation and important technology in data mining. In fact, in the real world, the data itself often has a hierarchical structure. Hierarchical clustering aims at constructing a cluster tree, which reveals the underlying modal structure of a complex density. Due to its inherent complexity, most existing hierarchical clustering algorithms are usually designed heuristically without an explicit objective function, which limits its utilization and analysis. K-means clustering, the well-known simple yet effective algorithm which can be expressed from the view of probability distribution, has inherent connection to Mixture of Gaussians (MoG). At this point, we consider combining Bayesian theory analysis with K-means algorithm. This motivates us to develop a hierarchical clustering based on K-means under the probability distribution framework, which is different from existing hierarchical K-means algorithms processing data in a single-pass manner along with heuristic strategies. For this goal, we propose an explicit objective function for hierarchical clustering, termed as Bayesian hierarchical K-means (BHK-means). In our method, a cascaded clustering tree is constructed, in which all layers interact with each other in the network-like manner. In this cluster tree, the clustering results of each layer are influenced by the parent and child nodes. Therefore, the clustering result of each layer is dynamically improved in accordance with the global hierarchical clustering objective function. The objective function is solved using the same algorithm as K-means, the Expectation-maximization algorithm. The experimental results on both synthetic data and benchmark datasets demonstrate the effectiveness of our algorithm over the existing related ones.

17 citations

Journal Article•10.3233/IDA-194653•
Multi-fuzzy-constrained graph pattern matching with big graph data.

[...]

Guliu Liu, Lei Li1, Xindong Wu1•
Hefei University of Technology1
1 Jan 2020
Book Chapter•10.1007/978-3-030-44584-3_34•
Human-to-AI Coach: Improving Human Inputs to AI Systems.

[...]

Johannes Schneider1•
University of Liechtenstein1
27 Apr 2020
TL;DR: In this paper, a conditional convolutional autoencoder (CCAE) was used to generate handwritten digit proposals to improve the efficiency of input generation for the human while keeping the original input as similar as possible to the original inputs.
Abstract: Humans increasingly interact with Artificial intelligence (AI) systems. AI systems are optimized for objectives such as minimum computation or minimum error rate in recognizing and interpreting inputs from humans. In contrast, inputs created by humans are often treated as a given. We investigate how inputs of humans can be altered to reduce misinterpretation by the AI system and to improve efficiency of input generation for the human while altered inputs should remain as similar as possible to the original inputs. These objectives result in trade-offs that are analyzed for a deep learning system classifying handwritten digits. To create examples that serve as demonstrations for humans to improve, we develop a model based on a conditional convolutional autoencoder (CCAE). Our quantitative and qualitative evaluation shows that in many occasions the generated proposals lead to lower error rates, require less effort to create and differ only modestly from the original samples.
Book Chapter•10.1007/978-3-030-44584-3_5•
GraphMDL: Graph Pattern Selection based on Minimum Description Length

[...]

Francesco Bariatti1, Peggy Cellier1, Sébastien Ferré1•
University of Rennes1
27 Apr 2020
TL;DR: An MDL-based approach for selecting a characteristic subset of patterns on labeled graphs with the introduction of ports to encode connections between pattern occurrences without any loss of information is proposed.
Abstract: Many graph pattern mining algorithms have been designed to identify recurring structures in graphs. The main drawback of these approaches is that they often extract too many patterns for human analysis. Recently, pattern mining methods using the Minimum Description Length (MDL) principle have been proposed to select a characteristic subset of patterns from transactional, sequential and relational data. In this paper, we propose an MDL-based approach for selecting a characteristic subset of patterns on labeled graphs. A key notion in this paper is the introduction of ports to encode connections between pattern occurrences without any loss of information. Experiments show that the number of patterns is drastically reduced. The selected patterns have complex shapes and are representative of the data.
Journal Article•10.3233/IDA-194477•
Balanced training/test set sampling for proper evaluation of classification models

[...]

Donghoon Kang1, Sejong Oh1•
Dankook University1
1 Jan 2020
Journal Article•10.3233/IDA-194820•
A hybrid deep learning model for predicting and targeting the less immunized area to improve childrens vaccination rate

[...]

G. Mohanraj1, V. Mohanraj1, J. Senthilkumar1, Y. Suresh1•
Sona College of Technology1
1 Jan 2020
TL;DR: A new hybrid deep learning model is proposed to predict and target vaccination rates in the less immunized regions in India using the data collected from the recently updated District Level Household Survey-4.
Abstract: There has been a major and rising interest in India for increasing vaccination rate among peoples to make the nation healthier and safer. In this paper, a new hybrid deep learning model is proposed to predict and target vaccination rates in the less immunized regions. The Rank-Based Multi-Layer Perceptron (R-MLP) hybrid deep learning framework uses the data collected from the recently updated District Level Household Survey-4 (DLHS). R-MLP model predicts and categorizes the percentage of partly immunized vaccination rates as extreme, low and medium ranges. This predicted findings are cross-verified by Deep Soft Cosine Semantic and Ranking SVM based model (DSS-RSM). DSS-RSM model uses the data obtained from the medical practitioners through a location-based social network. The proposed model predicts and extracts patterns with high similarity frequency for identifying vulnerable low immunization regions. It classifies the predicted patterns into two classes such as Class 1 is denoted as high ranked regions and Class 2 is denoted as low ranked regions based on the percentage of pattern matches. Finally, the results from R-MLP and DSS-RSM models are cross-linked together using ensemble model. This model finds the loss values to identify the target regions were health care program need to be conducted for increasing the level of immunization among children’s. The proposed hybrid deep learning models trains and validates using python-based Keras and TensorFlow deep learning libraries. The performance of the proposed hybrid deep learning model is compared with other variant machine learning techniques such as Decision Tree C5.0, Naive Bayes and Linear Regression. This comparative results are evaluated using evaluation measures such as Precision, Recall, Accuracy and F1-Measure. Our results show that the hybrid deep learning system is clearly superior to any other alternative approach.
Journal Article•10.3233/IDA-194486•
A comparison study on nonlinear dimension reduction methods with kernel variations: Visualization, optimization and classification

[...]

Katherine C. Kempfert1, Yishi Wang2, Cuixian Chen2, Samuel W. K. Wong3•
University of Florida1, University of North Carolina at Wilmington2, University of Waterloo3
1 Jan 2020
TL;DR: The computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction, and faster processing speeds and similar recognition rates on MORPH-II.
Abstract: Because of high dimensionality, correlation among covariates, and noise contained in data, dimension reduction (DR) techniques are often employed to the application of machine learning algorithms. Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and their kernel variants (KPCA, KLDA) are among the most popular DR methods. Recently, Supervised Kernel Principal Component Analysis (SKPCA) has been shown as another successful alternative. In this paper, brief reviews of these popular techniques are presented first. We then conduct a comparative performance study based on three simulated datasets, after which the performance of the techniques are evaluated through application to a pattern recognition problem in face image analysis. The gender classification problem is considered on MORPH-II and FG-NET, two popular longitudinal face aging databases. Several feature extraction methods are used, including biologically-inspired features (BIF), local binary patterns (LBP), histogram of oriented gradients (HOG), and the Active Appearance Model (AAM). After applications of DR methods, a linear support vector machine (SVM) is deployed with gender classification accuracy rates exceeding 95% on MORPH-II, competitive with benchmark results. A parallel computational approach is also proposed, attaining faster processing speeds and similar recognition rates on MORPH-II. Our computational approach can be applied to practical gender classification systems and generalized to other face analysis tasks, such as race classification and age prediction.
Journal Article•10.3233/IDA-194844•
An efficient Bayesian network structure learning algorithm using the strategy of two-stage searches

[...]

Huiping Guo, Hongru Li
1 Jan 2020
TL;DR: An efficient hybrid algorithm with the strategy of two-stage searches for Bayesian network structure learning can obtain better performance of BN structure learning.
Abstract: It is important for Bayesian network (BN) structure learning, a NP-problem, to improve the accuracy and hybrid algorithms are a kind of effective structure learning algorithms at present. Most hybrid algorithms adopt the strategy of one heuristic search and can be divided into two groups: one heuristic search based on initial BN skeleton and one heuristic search based on initial solutions. The former often fails to guarantee globality of the optimal structure and the latter fails to get the optimal solution because of large search space. In this paper, an efficient hybrid algorithm is proposed with the strategy of two-stage searches. For first-stage search, it firstly determines the local search space based on Maximal Information Coefficient by introducing penalty factors p1, p2, then searches the local space by Binary Particle Swarm Optimization. For second-stage search, an efficient ADR (the abbreviation of Add, Delete, Reverse) algorithm based on three basic operators is designed to extend the local space to the whole space. Experiment results show that the proposed algorithm can obtain better performance of BN structure learning.
Book Chapter•10.1007/978-3-030-44584-3_28•
AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model

[...]

Tien-Dung Nguyen1, Tomasz Maszczyk1, Katarzyna Musial1, Marc-André Zöller2, Bogdan Gabrys1 •
University of Technology, Sydney1, Software AG2
27 Apr 2020
TL;DR: In this article, the authors proposed a method to evaluate the validity of ML pipelines using a surrogate model (AVATAR), which enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines.
Abstract: The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution.
Journal Article•10.3233/IDA-184415•
A study on rare fraud predictions with big Medicare claims fraud data

[...]

Richard A. Bauder1, Taghi M. Khoshgoftaar1•
Florida Atlantic University1
1 Jan 2020
Book Chapter•10.1007/978-3-030-44584-3_19•
Adversarial Attacks Hidden in Plain Sight

[...]

Jan Philip Göpfert1, André Artelt1, Heiko Wersing2, Barbara Hammer1•
Bielefeld University1, Honda2
27 Apr 2020
TL;DR: A technique is composed that allows to hide adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer with regards to human visual perception.
Abstract: Convolutional neural networks have been used to achieve a string of successes during recent years, but their lack of interpretability remains a serious issue. Adversarial examples are designed to deliberately fool neural networks into making any desired incorrect classification, potentially with very high certainty. Several defensive approaches increase robustness against adversarial attacks, demanding attacks of greater magnitude, which lead to visible artifacts. By considering human visual perception, we compose a technique that allows to hide such adversarial attacks in regions of high complexity, such that they are imperceptible even to an astute observer. We carry out a user study on classifying adversarially modified images to validate the perceptual quality of our approach and find significant evidence for its concealment with regards to human visual perception.
Journal Article•10.3233/IDA-194656•
An outlier ensemble for unsupervised anomaly detection in honeypots data

[...]

Lynda Boukela1, Gongxuan Zhang1, Samia Bouzefrane, Junlong Zhou1•
Nanjing University of Science and Technology1
1 Jan 2020
Journal Article•10.3233/IDA-184403•
DNN models based on dimensionality reduction for stock trading

[...]

Dongdong Lv1, Dong Wang2, Meizi Li3, Yang Xiang1•
Tongji University1, Shanghai Institute of Technology2, Shanghai Normal University3
1 Jan 2020
Other•10.1002/9781119544487.CH1•
Intelligent Data Analysis

[...]

Sarthak Gupta1, Siddhant Bagga1, Deepak Kumar Sharma1•
Netaji Subhas Institute of Technology1
2 Jun 2020
Journal Article•10.3233/IDA-194574•
An improvement of SAX representation for time series by using complexity invariance

[...]

Xuan-May Thi Le, Tuan Minh Tran, Hien T. Nguyen1•
Banking University of Ho Chi Minh City1
1 Jan 2020
Journal Article•10.3233/IDA-194803•
Boosting meta-learning with simulated data complexity measures

[...]

Luís P. F. Garcia1, Adriano Rivolli, Edesio Alcoba2, Ana Carolina Lorena, André C. P. L. F. de Carvalho2 •
University of Brasília1, University of São Paulo2
1 Jan 2020
TL;DR: This paper proposes an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability, and consists of a novel Meta-Learning system able to predict the values of theData complexity measures for a dataset by using simpler meta-features as input.
Abstract: Meta-Learning has been largely used over the last years to support the recommendation of the most suitable machine learning algorithm(s) and hyperparameters for new datasets. Traditionally, a meta-base is created containing meta-features extracted from several datasets along with the performance of a pool of machine learning algorithms when applied to these datasets. The meta-features must describe essential aspects of the dataset and distinguish different problems and solutions. However, if one wants the use of Meta-Learning to be computationally efficient, the extraction of the meta-feature values should also show a low computational cost, considering a trade-off between the time spent to run all the algorithms and the time required to extract the meta-features. One class of measures with successful results in the characterization of classification datasets is concerned with estimating the underlying complexity of the classification problem. These data complexity measures take into account the overlap between classes imposed by the feature values, the separability of the classes and distribution of the instances within the classes. However, the extraction of these measures from datasets usually presents a high computational cost. In this paper, we propose an empirical approach designed to decrease the computational cost of computing the data complexity measures, while still keeping their descriptive ability. The proposal consists of a novel Meta-Learning system able to predict the values of the data complexity measures for a dataset by using simpler meta-features as input. In an extensive set of experiments, we show that the predictive performance achieved by Meta-Learning systems which use the predicted data complexity measures is similar to the performance obtained using the original data complexity measures, but the computational cost involved in their computation is significantly reduced.
Journal Article•10.3233/IDA-194961•
Hybrid recommendation model based on deep learning and Stacking integration strategy

[...]

Xiaolan Xie, Shantian Pang, Jili Chen
1 Jan 2020
Book Chapter•10.1007/978-3-030-44584-3_4•
Efficient Batch-Incremental Classification Using UMAP for Evolving Data Streams

[...]

Maroua Bahri1, Maroua Bahri2, Bernhard Pfahringer3, Albert Bifet3, Albert Bifet1, Silviu Maniu4, Silviu Maniu2 •
Télécom ParisTech1, Université Paris-Saclay2, University of Waikato3, École Normale Supérieure4
27 Apr 2020
TL;DR: A batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification.
Abstract: Learning from potentially infinite and high-dimensional data streams poses significant challenges in the classification task. For instance, k-Nearest Neighbors (kNN) is one of the most often used algorithms in the data stream mining area that proved to be very resource-intensive when dealing with high-dimensional spaces. Uniform Manifold Approximation and Projection (UMAP) is a novel manifold technique and one of the most promising dimension reduction and visualization techniques in the non-streaming setting because of its high performance in comparison with competitors. However, there is no version of UMAP that copes with the challenging context of streams. To overcome these restrictions, we propose a batch-incremental approach that pre-processes data streams using UMAP, by producing successive embeddings on a stream of disjoint batches in order to support an incremental kNN classification. Experiments conducted on publicly available synthetic and real-world datasets demonstrate the substantial gains that can be achieved with our proposal compared to state-of-the-art techniques.
Book Chapter•10.1007/978-3-030-44584-3_42•
Making Learners (More) Monotone

[...]

Tom J. Viering1, Alexander Mey1, Marco Loog1, Marco Loog2•
Delft University of Technology1, University of Copenhagen2
27 Apr 2020
TL;DR: This work proposes three algorithms that take a supervised learning model and make it perform more monotone, and proves consistency and monotonicity with high probability, and evaluates the algorithms on scenarios where non-monotone behaviour occurs.
Abstract: Learning performance can show non-monotonic behavior. That is, more data does not necessarily lead to better models, even on average. We propose three algorithms that take a supervised learning model and make it perform more monotone. We prove consistency and monotonicity with high probability, and evaluate the algorithms on scenarios where non-monotone behaviour occurs. Our proposed algorithm \(\text {MT}_{\text {HT}}\) makes less than \(1\%\) non-monotone decisions on MNIST while staying competitive in terms of error rate compared to several baselines. Our code is available at https://github.com/tomviering/monotone.
Book Chapter•10.1007/978-3-030-44584-3_21•
Overlapping Hierarchical Clustering (OHC)

[...]

Ian Jeantet1, Zoltán Miklós1, David Gross-Amblard1•
University of Rennes1
27 Apr 2020
TL;DR: A new method is proposed that allows clusters to overlap until a strong cluster attraction is reached, based on a density criterion, and the resulting hierarchical structure is represented as a directed acyclic graph and combines the advantages of hierarchies with the precision of a less arbitrary clustering.
Abstract: Agglomerative clustering methods have been widely used by many research communities to cluster their data into hierarchical structures. These structures ease data exploration and are understandable even for non-specialists. But these methods necessarily result in a tree, since, at each agglomeration step, two clusters have to be merged. This may bias the data analysis process if, for example, a cluster is almost equally attracted by two others. In this paper we propose a new method that allows clusters to overlap until a strong cluster attraction is reached, based on a density criterion. The resulting hierarchical structure, called a quasi-dendrogram, is represented as a directed acyclic graph and combines the advantages of hierarchies with the precision of a less arbitrary clustering. We validate our work with extensive experiments on real data sets and compare it with existing tree-based methods, using a new measure of similarity between heterogeneous hierarchical structures.
Journal Article•10.3233/IDA-194576•
Online Analytical Processsing on Graph Data

[...]

Leticia I. Gómez1, Bart Kuijpers2, Alejandro A. Vaisman1•
Instituto Tecnológico de Buenos Aires1, University of Hasselt2
1 Jan 2020
TL;DR: In this article, the authors propose a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies.
Abstract: Online Analytical Processing (OLAP) comprises tools and algorithms that allow querying multidimensional databases. It is based on the multidimensional model, where data can be seen as a cube such that each cell contains one or more measures that can be aggregated along dimensions. In a Big Data scenario, traditional data warehousing and OLAP operations are clearly not sufficient to address current data analysis requirements, for example, social network analysis. Furthermore, OLAP operations and models can expand the possibilities of graph analysis beyond the traditional graph-based computation. Nevertheless, there is not much work on the problem of taking OLAP analysis to the graph data model. This paper proposes a formal multidimensional model for graph analysis, that considers the basic graph data, and also background information in the form of dimension hierarchies. The graphs in this model are node- and edge-labelled directed multi-hypergraphs, called graphoids, which can be defined at several different levels of granularity using the dimensions associated with them. Operations analogous to the ones used in typical OLAP over cubes are defined over graphoids. The paper presents a formal definition of the graphoid model for OLAP, proves that the typical OLAP operations on cubes can be expressed over the graphoid model, and shows that the classic data cube model is a particular case of the graphoid data model. Finally, a case study supports the claim that, for many kinds of OLAP-like analysis on graphs, the graphoid model works better than the typical relational OLAP alternative, and for the classic OLAP queries, it remains competitive.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve