Top 161 papers presented at Intelligent Data Analysis in 2017

Showing papers presented at "Intelligent Data Analysis in 2017"

Journal Article•10.3233/IDA-163129•

Davies Bouldin Index based hierarchical initialization K-means

[...]

Junwei Xiao, Jianfeng Lu, Xiangyu Li

1 Jan 2017

122 citations

Journal Article•10.3233/IDA-163209•

Hybrid recommender systems: A systematic literature review

[...]

Erion Çano¹, Maurizio Morisio¹•Institutions (1)

Polytechnic University of Turin¹

1 Jan 2017

TL;DR: A systematic literature review as discussed by the authors presents the state-of-the-art in hybrid recommender systems of the last decade and addresses the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them.

...read moreread less

Abstract: Recommender systems are software tools used to generate and provide suggestions for items and other entities to the users by exploiting various strategies. Hybrid recommender systems combine two or more recommendation strategies in different ways to benefit from their complementary advantages. This systematic literature review presents the state of the art in hybrid recommender systems of the last decade. It is the first quantitative review work completely focused in hybrid recommenders. We address the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them. We also explore the hybridization classes each hybrid recommender belongs to, the application domains, the evaluation process and proposed future research directions. Based on our findings, most of the studies combine collaborative filtering with another technique often in a weighted way. Also cold-start and data sparsity are the two traditional and top problems being addressed in 23 and 22 studies each, while movies and movie datasets are still widely used by most of the authors. As most of the studies are evaluated by comparisons with similar methods using accuracy metrics, providing more credible and user oriented evaluations remains a typical challenge. Besides this, newer challenges were also identified such as responding to the variation of user context, evolving user tastes or providing cross-domain recommendations. Being a hot topic, hybrid recommenders represent a good basis with which to respond accordingly by exploring newer opportunities such as contextualizing recommendations, involving parallel hybrid algorithms, processing larger datasets, etc.

...read moreread less

94 citations

Book Chapter•10.1007/978-3-319-68527-4_2•

Face Recognition Based on HOG and Fast PCA Algorithm

[...]

Xiang-Yu Li, Zhen-Xian Lin

9 Oct 2017

TL;DR: A new method of face recognition based on gradient direction histogram (HOG) features extraction and fast principal component analysis (PCA) algorithm is proposed to solve the problem of low accuracy offace recognition under non-restrictive conditions.

...read moreread less

Abstract: A new method of face recognition based on gradient direction histogram (HOG) features extraction and fast principal component analysis (PCA) algorithm is proposed to solve the problem of low accuracy of face recognition under non-restrictive conditions. In this method, the Haar feature classifier is used to extract and extract the original data, and then the HOG features are extracted from the image data and the PCA dimension reduction is processed, and the Support Vector Machines (SVM) algorithm is used to recognize the face. The experimental results of the classification recognition on the LFW face database verify the effectiveness of the method.

...read moreread less

45 citations

Journal Article•10.3233/IDA-163180•

Efficiently mining of skyline frequent-utility patterns

[...]

Jeng-Shyang Pan¹, Jerry Chun-Wei Lin², Lu Yang², Philippe Fournier-Viger², Tzung-Pei Hong³, Tzung-Pei Hong⁴ - Show less +2 more•Institutions (4)

Fuzhou University¹, Harbin Institute of Technology Shenzhen Graduate School², National Sun Yat-sen University³, National University of Kaohsiung⁴

1 Jan 2017

24 citations

Journal Article•10.3233/IDA-150390•

Extracting domain-specific stopwords for text classifiers

[...]

Masoud Makrehchi¹, Mohamed S. Kamel²•Institutions (2)

University of Ontario Institute of Technology¹, University of Waterloo²

1 Jan 2017

23 citations

Journal Article•10.3233/IDA-150479•

A social influence based trust model for recommender systems

[...]

Jian-Ping Mei¹, Han Yu², Zhiqi Shen², Chunyan Miao²•Institutions (2)

Zhejiang University of Technology¹, Nanyang Technological University²

1 Jan 2017

TL;DR: A trustee-influence based trust model where a trustee’s activeness or trustworthiness is used to determine trust relationships is incorporated into a memory-based and matrix factorization recommender systems to support online purchasing decision-making.

...read moreread less

Abstract: Trustworthy computing has recently attracted significant interest from researchers in several fields including multi-agent systems, social network analysis, and recommender systems. As an additional dimension of information to past rating history, trust has been shown to be helpful for improving the accuracy of recommendations. Studies on the relationship between trust and rating behaviors may provide insights into the formation of trust in the context of online community, and lead to possible indicators for the effective use of trust in recommendations. In this paper, we study people’s trust and rating behavior with the Epinions dataset. Epinions.com is a popular product review website allowing users to rate various categories of products, and establish a list of trustworthy users. We perform correlation analysis of activeness and trustworthiness defined by the number of ratings and the ∗Corresponding author. Tel.: (+86) 571 8529 0527. Email addresses: jpmei@zjut.edu.cn (Jian-Ping Mei), han.yu@ntu.edu.sg (Han Yu), zqshen@ntu.edu.sg (Zhiqi Shen), ascymiao@ntu.edu.sg (Chunyan Miao) Preprint submitted to Intelligent Data Analysis April 7, 2016 number of trustors to derive findings that can help the design of new decision support mechanisms in trust-based recommender systems. We then propose a trustee-influence based trust model where a trustee’s activeness or trustworthiness is used to determine trust relationships. This trust model is incorporated into a memory-based and matrix factorization recommender systems to support online purchasing decision-making. Experimental results demonstrate the effectiveness of the proposed trust model for recommendation.

...read moreread less

20 citations

Journal Article•10.3233/IDA-160044•

A framework for detecting deviations in complex event logs

[...]

Guangming Li, Wil M. P. van der Aalst

19 Aug 2017

TL;DR: This paper proposes a novel approach that is faster than cluster-based approaches because it creates a so-called prole which is less time-consuming than creating clusters and more accurate than model-based approach because it uses an iterative approach to improve the result.

...read moreread less

Abstract: Deviating behavior within an organization can lead to unexpected results. The effects of deviations are often negative, but sometimes also positive. Therefore, it is useful to detect deviations from event logs which record all the behavior of the organization. However, existing model-based and cluster-based approaches are inaccurate or slow when dealing with complex event logs, i.e. logs of less structured processes having many activities and many possible paths. This paper proposes a novel approach that is faster than cluster-based approaches because it creates a so-called profile which is less time-consuming than creating clusters. Furthermore, the approach is also more accurate than model-based approaches because we use an iterative approach to improve the result. Our experiments show that approach outperforms existing techniques in a variety of circumstances.

...read moreread less

20 citations

Book Chapter•10.1007/978-3-319-68765-0_24•

Computational Topology Techniques for Characterizing Time-Series Data

[...]

Nicole F. Sanderson¹, Elliott Shugerman¹, Samantha Molnar¹, James D. Meiss¹, Elizabeth Bradley¹ - Show less +1 more•Institutions (1)

University of Colorado Boulder¹

26 Oct 2017

TL;DR: Topological data analysis (TDA), while abstract, allows a characterization of time-series data obtained from nonlinear and complex dynamical systems and gives rise to the concept of persistent homology: how shape changes with scale.

...read moreread less

Abstract: Topological data analysis (TDA), while abstract, allows a characterization of time-series data obtained from nonlinear and complex dynamical systems. Though it is surprising that such an abstract measure of structure—counting pieces and holes—could be useful for real-world data, TDA lets us compare different systems, and even do membership testing or change-point detection. However, TDA is computationally expensive and involves a number of free parameters. This complexity can be obviated by coarse-graining, using a construct called the witness complex. The parametric dependence gives rise to the concept of persistent homology: how shape changes with scale. Its results allow us to distinguish time-series data from different systems—e.g., the same note played on different musical instruments.

...read moreread less

20 citations

Book Chapter•10.1007/978-3-319-68765-0_17•

Learning DTW-Preserving Shapelets

[...]

Arnaud Lods, Simon Malinowski, Romain Tavenard¹, Laurent Amsaleg•Institutions (1)

University of Rennes¹

26 Oct 2017

TL;DR: This work focuses on learning, without class label information, shapelets such that Euclidean distances in the ST-space approximate well the true DTW, which leads to an ubiquitous representation of time series in a metric space, where any machine learning method (supervised or unsupervised) and indexing system can operate efficiently.

...read moreread less

Abstract: Dynamic Time Warping (DTW) is one of the best similarity measures for time series, and it has extensively been used in retrieval, classification or mining applications. It is a costly measure, and applying it to numerous and/or very long times series is difficult in practice. Recently, Shapelet Transform (ST) proved to enable accurate supervised classification of time series. ST learns small subsequences that well discriminate classes, and transforms the time series into vectors lying in a metric space. In this paper, we adopt the ST framework in a novel way: we focus on learning, without class label information, shapelets such that Euclidean distances in the ST-space approximate well the true DTW. Our approach leads to an ubiquitous representation of time series in a metric space, where any machine learning method (supervised or unsupervised) and indexing system can operate efficiently.

...read moreread less

19 citations

Journal Article•10.3233/IDA-170872•

Word co-occurrence augmented topic model in short text

[...]

Guan Bin Chen¹, Hung-Yu Kao•Institutions (1)

National Cheng Kung University¹

1 Jan 2017

TL;DR: The authors proposed an improvement of word co-occurrence method to enhance the topic models and applied the word cooccurrence information to the BTM, and the experimental results show that the proposed methods are based on the original topic model that they did not need any external data and their proposed methods can easily apply to some other existing BTM based models.

...read moreread less

Abstract: The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. The Bi-term topic model (BTM) has been proposed to improve this problem. However, BTM just consider simple bi-term frequency which cause the generated topics are dominated by common words. In this paper, we solve the problem of the frequent bi-term in BTM. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We apply the word co-occurrence information to the BTM. The experimental result that show our PMI-β-BTM gets well result in the both of regular short news title text and the noisy tweet text. Moreover, there are two advantages in our method. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing BTM based models.

...read moreread less

19 citations

Book Chapter•10.1007/978-3-319-68765-0_6•

Seasonal Variation in Collective Mood via Twitter Content and Medical Purchases

[...]

Fabon Dzogang¹, James Goulding², Stafford L. Lightman¹, Nello Cristianini¹•Institutions (2)

University of Bristol¹, University of Nottingham²

26 Oct 2017

TL;DR: This study compares Twitter signals relative to anxiety, sadness, anger, and fatigue with purchase of items related to Anxiety, stress and fatigue at a major UK Health and Beauty retailer, and finds that all of these signals are highly correlated and strongly seasonal.

...read moreread less

Abstract: The analysis of sentiment contained in vast amounts of Twitter messages has reliably shown seasonal patterns of variation in multiple studies, a finding that can have great importance in the understanding of seasonal affective disorders, particularly if related with known seasonal variations in certain hormones. An important question, however, is that of directly linking the signals coming from Twitter with other sources of evidence about average mood changes. Specifically we compare Twitter signals relative to anxiety, sadness, anger, and fatigue with purchase of items related to anxiety, stress and fatigue at a major UK Health and Beauty retailer. Results show that all of these signals are highly correlated and strongly seasonal, being under-expressed in the summer and over-expressed in the other seasons, with interesting differences and similarities across them. Anxiety signals, extracted from both Twitter and from Health product purchases, peak in spring and autumn, and correlate also with the purchase of stress remedies, while Twitter sadness has a peak in the Winter, along with Twitter anger and remedies for fatigue. Surprisingly, purchase of remedies for fatigue do not match the Twitter fatigue, suggesting that perhaps the names we give to these indicators are only approximate indications of what they actually measure. This study contributes both to the clarification of the mood signals contained in social media, and more generally to our understanding of seasonal cycles in collective mood.

...read moreread less

Journal Article•10.3233/IDA-163131•

Possibilistic interest discovery from uncertain information in social networks

[...]

Mondher Sendi¹, Mohamed Nazih Omri¹, Mourad Abed²•Institutions (2)

University of Sousse¹, University of Valenciennes and Hainaut-Cambresis²

1 Jan 2017

TL;DR: A new approach for users’ interest discovery from uncertain information that augments traditional methods using possibilistic logic is proposed and the comparison with the most known methods proves the significance of this approach.

...read moreread less

Abstract: User generated content on the microblogging social network Twitter continues to grow with significant amount of information. The semantic analysis offers the opportunity to discover and model latent interests’ in the users’ publications. This article focuses on the problem of uncertainty in the users’ publications that has not been previously treated. It proposes a new approach for users’ interest discovery from uncertain information that augments traditional methods using possibilistic logic. The possibility theory provides a solid theoretical base for the treatment of incomplete and imprecise information and inferring the reliable expressions from a knowledge base. More precisely, this approach used the product-based possibilistic network to model knowledge base and discovering possibilistic interests. DBpedia ontology is integrated into the interests’ discovery process for selecting the significant topics. The empirical analysis and the comparison with the most known methods proves the significance of this approach.

...read moreread less

Journal Article•10.3233/IDA-170878•

Effective social content-based collaborative filtering for music recommendation

[...]

Ja-Hwung Su¹, Wei-Yi Chang², Vincent S. Tseng³•Institutions (3)

Cheng Shiu University¹, Center for Information Technology², National Chiao Tung University³

1 Jan 2017

Journal Article•10.3233/IDA-163141•

Exploiting statistically significant dependent rules for associative classification

[...]

Jundong Li¹, Osmar R. Zaïane²•Institutions (2)

Arizona State University¹, University of Alberta²

10 Oct 2017

TL;DR: This paper proposes a novel associative classifier, SigDirect, which uses Fisher’s exact test as a significance measure to directly mine classification association rules by some effective pruning strategies, and achieves better performance in terms of classification accuracy when measured with state-of-the-art rule based and associativeclassifiers.

...read moreread less

Abstract: Established associative classification algorithms have shown to be very effective in handling categorical data such as text data. The learned model is a set of rules that are easy to understand and can be edited. However, they still suffer from the following limitations: first, they mostly use the support-confidence framework to mine classification association rules which require the setting of some confounding parameters; second, the lack of statistical dependency in the used framework may lead to the omission of many interesting rules and the detection of meaningless rules; third, the rule generation process usually generates a sheer number of rules which puts in question the interpretability and readability of the learned associative classification model. In this paper, we propose a novel associative classifier, SigDirect, to address the above problems. In particular, we use Fisher’s exact test as a significance measure to directly mine classification association rules by some effective pruning strategies. Without any threshold settings like minimum support and minimum confidence, SigDirect is able to find non-redundant classification association rules which express a statistically significant dependency between a set of antecedent items and a consequent class label. To further reduce the number of noisy rules, we present an instance-centric rule pruning strategy to find a subset of rules of high quality. At last, we propose and investigate various rule classification strategies to achieve a more accurate classification model. Experimental results on real-world datasets show that SigDirect achieves better performance in terms of classification accuracy when measured with state-of-the-art rule based and associative classifiers. Furthermore, the number of rules generated by SigDirect is orders of magnitude smaller than the number of rules found by other associative classifiers, which is very appealing in practice.

...read moreread less

Book Chapter•10.1007/978-3-319-68765-0_28•

A Structural Benchmark for Logical Argumentation Frameworks

[...]

Bruno Yun¹, Srdjan Vesic², Madalina Croitoru¹, Pierre Bisquert³, Rallou Thomopoulos³ - Show less +1 more•Institutions (3)

University of Montpellier¹, Artois University², French Institute for Research in Computer Science and Automation³

26 Oct 2017

TL;DR: A practically-oriented benchmark suite for computational argumentation that instantiates abstract argumentation frameworks with existential rules, a language widely used in Semantic Web applications and provides a generator of such instantiated graphs is proposed.

...read moreread less

Abstract: This paper proposes a practically-oriented benchmark suite for computational argumentation. We instantiate abstract argumentation frameworks with existential rules, a language widely used in Semantic Web applications and provide a generator of such instantiated graphs. We analyse performance of argumentation solvers on these benchmarks.

...read moreread less

Journal Article•10.3233/IDA-170875•

Membrane computing inspired feature selection model for microarray cancer data

[...]

Naeimeh Elkhani¹, Ravie Chandren Muniyandi•Institutions (1)

National University of Malaysia¹

1 Jan 2017

Journal Article•10.3233/IDA-170874•

Efficiently mining high utility sequential patterns in static and streaming data

[...]

Morteza Zihayat¹, Cheng-Wei Wu², Aijun An¹, Vincent S. Tseng², Chien Lin³ - Show less +1 more•Institutions (3)

York University¹, National Chiao Tung University², Institute for Information Industry³

1 Jan 2017

TL;DR: HUSP-Stream is the first method to find HUSPs over data streams and a novel utility model called SequenceSuffix Utility is proposed for effectively pruning the search space in HUSP mining.

...read moreread less

Abstract: High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come continuously and often at a high speed. To efficiently deal with both problems, we propose a novel framework for mining high utility sequential patterns over static and streaming databases. In this regard, two efficient data structures named ItemUtilLists (Item Utility Lists) and HUSP-Tree (High Utility Sequential Pattern Tree) are proposed to maintain essential information for mining HUSPs in both offline and online fashions. In addition, a novel utility model called SequenceSuffix Utility is proposed for effectively pruning the search space in HUSP mining. We propose an algorithm named HUSP-Miner (High Utility Sequential Pattern Miner) to find HUSPs in static databases efficiently. Then, a one-pass algorithm named HUSP-Stream (High Utility Sequential Pattern mining over Data Streams) is proposed to incrementally update ItemUtilLists and HUSP-Tree online and find HUSPs over data streams. To the best of our knowledge, HUSP-Stream is the first method to find HUSPs over data streams. Experimental results on both real and synthetic datasets show that HUSP-Miner outperforms the compared algorithms substantially in terms of execution time, memory usage and number of generated candidates. The experiments also demonstrate impressive performance of HUSPStream to update the data structures and discover HUSPs over data streams.

...read moreread less

Journal Article•10.3233/IDA-160069•

Apriori and GUHA – Comparing two approaches to data mining with association rules

[...]

Jan Rauch¹, Milan Šimůnek¹•Institutions (1)

University of Economics, Prague¹

1 Jan 2017

Journal Article•10.3233/IDA-160034•

Financial distress prediction using SVM ensemble based on earnings manipulation and fuzzy integral

[...]

Chao Huang, Qingyu Yang, Mingwei Du, Donghui Yang

1 Jan 2017

Book Chapter•10.1007/978-3-319-68765-0_9•

Interactive Pattern Sampling for Characterizing Unlabeled Data

[...]

Arnaud Giacometti¹, Arnaud Soulet¹•Institutions (1)

François Rabelais University¹

26 Oct 2017

TL;DR: A new interactive pattern mining method that learns which part of the dataset is really interesting for the user by integrating user feedback about patterns, and aims at sampling patterns with a probability proportional to their frequency in the interesting transactions.

...read moreread less

Abstract: Many data exploration tasks require a target class. Unfortunately, the data is not always labeled with respect to this desired class. Rather than using unsupervised methods or a labeling pre-processing, this paper proposes an interactive system that discovers this target class and characterizes it at the same time. More precisely, we introduce a new interactive pattern mining method that learns which part of the dataset is really interesting for the user. By integrating user feedback about patterns, our method aims at sampling patterns with a probability proportional to their frequency in the interesting transactions. We demonstrate that it accurately identifies the target class if user feedback is consistent. Experiments also show this method has a good true and false positive rate enabling to present relevant patterns to the user.

...read moreread less

Journal Article•10.3233/IDA-150499•

Active seed selection for constrained clustering

[...]

Viet-Vu Vu¹, Nicolas Labroche²•Institutions (2)

Vietnam National University, Hanoi¹, François Rabelais University²

1 Jan 2017

Journal Article•10.3233/IDA-160021•

Incorporating Wikipedia concepts and categories as prior knowledge into topic models

[...]

Kang Xu¹, Guilin Qi¹, Junheng Huang¹, Tianxing Wu¹•Institutions (1)

Southeast University¹

1 Jan 2017

Journal Article•10.3233/IDA-150489•

Fuzzy c-Least Medians clustering for discovery of web access patterns from web user sessions data

[...]

Zahid Ansari¹, Ahmed Rimaz Faizabadi¹, Asif Afzal¹•Institutions (1)

P A College of Engineering¹

1 Jan 2017

Book Chapter•10.1007/978-3-319-68527-4_26•

Adaptive Signal Processing of Fetal PCG Recorded by Interferometric Sensor

[...]

Radek Martinek¹, Radana Kahankova¹, Jan Nedoma¹, Marcel Fajkus¹, Homer Nazeran², Jana Nowaková¹ - Show less +2 more•Institutions (2)

Technical University of Ostrava¹, University of Texas at El Paso²

9 Oct 2017

TL;DR: Adaptive methods based on Least Mean Square and Recursive Least Square algorithms are used for the elimination of the maternal component of the fetal phonocardiogram.

...read moreread less

Abstract: This paper is focused on the design, implementation, and verification of an adaptive system for processing of the fetal phonocardiogram (fPCG) recorded by the novel interferometric sensor. The main interference to be suppressed in the abdominal signal is the maternal phonocardiogram (mPCG). In this article, adaptive methods based on Least Mean Square and Recursive Least Square algorithms are used for the elimination of the maternal component. Evaluation of the filtration quality is provided using the objective parameters (Signal Noise to Ratio, Sensitivity, and Positive Predictive Value).

...read moreread less

Journal Article•10.3233/IDA-160031•

Instance-based classification with Ant Colony Optimization

[...]

Khalid M. Salama¹, Ashraf M. Abdelbar², Ayah Helal¹, Alex A. Freitas¹•Institutions (2)

University of Kent¹, Brandon University²

1 Jan 2017

TL;DR: This paper introduces a novel class-based feature weighting technique, in the context of instance-based distance methods, using the Ant Colony Optimization meta-heuristic, and proposes an ensemble of classifiers approach that makes use of the archived populations of the ACO?

...read moreread less

Abstract: Instance-based learning (IBL) methods predict the class label of a new instance based directly on the distance between the new unlabeled instance and each labeled instance in the training set, without constructing a classification model in the training phase. In this paper, we introduce a novel class-based feature weighting technique, in the context of instance-based distance methods, using the Ant Colony Optimization meta-heuristic. We address three different approaches of instance-based classification: k-Nearest Neighbours, distance-based Nearest Neighbours, and Gaussian Kernel Estimator. We present a multi-archive adaptation of the ACO? algorithm and apply it to the optimization of the key parameter in each IBL algorithm and of the class-based feature weights. We also propose an ensemble of classifiers approach that makes use of the archived populations of the ACO? algorithm. We empirically evaluate the performance of our proposed algorithms on 36 benchmark datasets, and compare them with conventional instance-based classification algorithms, using various parameter settings, as well as with a state-of-the-art coevolutionary algorithm for instance selection and feature weighting for Nearest Neighbours classifiers.

...read moreread less

Journal Article•10.3233/IDA-170882•

Deceptive text detection using continuous semantic space models

[...]

Ángel Hernández-Castañeda¹, Hiram Calvo¹•Institutions (1)

Instituto Politécnico Nacional¹

1 Jan 2017

Journal Article•10.3233/IDA-150316•

Scalable and practical One-Pass clustering algorithm for recommender system

[...]

Asra Khalid¹, Mustansar Ali Ghazanfar¹, Muhammad Awais Azam¹, Yasmeen Fahad Aldhafiri, Sobia Zahra¹ - Show less +1 more•Institutions (1)

University of Engineering and Technology¹

1 Jan 2017

TL;DR: A new clustering algorithm called One-Pass is proposed, which is a simple realtime algorithm that maintains a good level of accuracy, scale well with data, and build the training model incrementally with the arrival of new data.

...read moreread less

Abstract: Recommender systems apply artificial intelligence techniques for filtering unseen information and predict whether a user would like/dislike a given item. K-Means clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple realtime algorithm that maintains a good level of accuracy, scale well with data, and build the training model incrementally with the arrival of new data. We run One-Pass algorithm on four different datasets (MovieLens, Film Trust, Book Crossing, and Last-FM) and empirically show that the proposed algorithm outperforms K-Means in terms of recommendation and training time. Moreover, One-Pass algorithm is comparable to K-Means in term of accuracy and cluster quality.

...read moreread less

Journal Article•10.3233/IDA-163075•

Unsupervised active learning techniques for labeling training sets: An experimental evaluation on sequential data

[...]

Vinicius M. A. Souza¹, Rafael G. Rossi², Rafael G. Rossi¹, Gustavo E. A. P. A. Batista¹, Solange Oliveira Rezende¹ - Show less +1 more•Institutions (2)

Spanish National Research Council¹, Federal University of Mato Grosso do Sul²

1 Jan 2017

Journal Article•10.3233/IDA-163098•

Cluster-Indistinguishability: A practical differential privacy mechanism for trajectory clustering

[...]

Hao Wang¹, Zhengquan Xu¹, Shan Jia¹•Institutions (1)

Wuhan University¹

1 Jan 2017

TL;DR: This paper proposes a differential privacy preserving mechanism, Cluster-Indistinguishability, and derives the probability density function of two-dimensional Laplace noise, which satisfies the above definition.

...read moreread less

Abstract: An important method of spatial-temporal data mining, trajectory clustering can mine valuable information in trajectories. However, cluster results without special sanitization pose serious threats to individual location privacy. Existing privacy preserving mechanisms for trajectory clustering still contend with the problems of narrow applicability, low-level utility, and difficulty in being applied to real scenarios. In this paper, we therefore propose a differential privacy preserving mechanism, Cluster-Indistinguishability, to support trajectory clustering. Firstly, a general model of typical trajectory clustering algorithms is given, and the definition of differential privacy is introduced according to the model. Then, we derive the probability density function of two-dimensional Laplace noise, which satisfies the above definition. Finally, we transform the noise from a Cartesian coordinate system to a Polar coordinate system to efficiently apply it in real scenarios. Experimental results show that Cluster-Indistinguishability has general applicability and better performance compared to existing methods.

...read moreread less

Journal Article•10.3233/IDA-163020•

PGNBC: Pearson Gaussian Naïve Bayes classifier for data stream classification with recurring concept drift

[...]

D. Kishore Babu, Y. Ramadevi¹, K.V. Ramana²•Institutions (2)

Chaitanya Bharathi Institute of Technology¹, Jawaharlal Nehru Technological University, Kakinada²

10 Oct 2017

TL;DR: The proposed PGNBC method is the advancement over the existing Guassian Naïve Bayes classifier (GNBC) by additionally adding the correlation among the attributes to improve the performance.

...read moreread less

Abstract: In data stream classification, selecting the classifier for the dynamic feature space and considering the concept drift is a challenging task. This paper addresses the major challenges in the data stream classification with recurring concept drift. We developed a novel classification method known as Pearson Guassian Naïve Bayes classification (PGNBC). The proposed PGNBC method is the advancement over the existing Guassian Naïve Bayes classifier (GNBC) by additionally adding the correlation among the attributes. For the data stream classification, the proposed PGNBC is frequently updated based on the concept drift. This newly developed method is experimented by comparing the results with the existing methods such as RGNBC and MReC-DFS. The metrics such as sensitivity, specificity and accuracy are used for measuring the performance. It is found that the improvement in terms of sensitivity, specificity and accuracy values are better for the proposed method, with the values of 4%, 1% and 1% respectively, which is higher for the PGNBC method than the RGNBC method for the skin data. But with the localization data, the improvement in terms of specificity and accuracy values are 6% and 2% respectively which is higher than the RGNBC.

...read moreread less

...

Expand