Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Intelligent Data Analysis
  4. 2017
  1. Home
  2. Conferences
  3. Intelligent Data Analysis
  4. 2017
Showing papers presented at "Intelligent Data Analysis in 2017"
Journal Article•10.3233/IDA-163129•
Davies Bouldin Index based hierarchical initialization K-means

[...]

Junwei Xiao, Jianfeng Lu, Xiangyu Li
1 Jan 2017

122 citations

Journal Article•10.3233/IDA-163209•
Hybrid recommender systems: A systematic literature review

[...]

Erion Çano1, Maurizio Morisio1•
Polytechnic University of Turin1
1 Jan 2017
TL;DR: A systematic literature review as discussed by the authors presents the state-of-the-art in hybrid recommender systems of the last decade and addresses the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them.
Abstract: Recommender systems are software tools used to generate and provide suggestions for items and other entities to the users by exploiting various strategies. Hybrid recommender systems combine two or more recommendation strategies in different ways to benefit from their complementary advantages. This systematic literature review presents the state of the art in hybrid recommender systems of the last decade. It is the first quantitative review work completely focused in hybrid recommenders. We address the most relevant problems considered and present the associated data mining and recommendation techniques used to overcome them. We also explore the hybridization classes each hybrid recommender belongs to, the application domains, the evaluation process and proposed future research directions. Based on our findings, most of the studies combine collaborative filtering with another technique often in a weighted way. Also cold-start and data sparsity are the two traditional and top problems being addressed in 23 and 22 studies each, while movies and movie datasets are still widely used by most of the authors. As most of the studies are evaluated by comparisons with similar methods using accuracy metrics, providing more credible and user oriented evaluations remains a typical challenge. Besides this, newer challenges were also identified such as responding to the variation of user context, evolving user tastes or providing cross-domain recommendations. Being a hot topic, hybrid recommenders represent a good basis with which to respond accordingly by exploring newer opportunities such as contextualizing recommendations, involving parallel hybrid algorithms, processing larger datasets, etc.

94 citations

Book Chapter•10.1007/978-3-319-68527-4_2•
Face Recognition Based on HOG and Fast PCA Algorithm

[...]

Xiang-Yu Li, Zhen-Xian Lin
9 Oct 2017
TL;DR: A new method of face recognition based on gradient direction histogram (HOG) features extraction and fast principal component analysis (PCA) algorithm is proposed to solve the problem of low accuracy offace recognition under non-restrictive conditions.
Abstract: A new method of face recognition based on gradient direction histogram (HOG) features extraction and fast principal component analysis (PCA) algorithm is proposed to solve the problem of low accuracy of face recognition under non-restrictive conditions. In this method, the Haar feature classifier is used to extract and extract the original data, and then the HOG features are extracted from the image data and the PCA dimension reduction is processed, and the Support Vector Machines (SVM) algorithm is used to recognize the face. The experimental results of the classification recognition on the LFW face database verify the effectiveness of the method.

45 citations

Journal Article•10.3233/IDA-163180•
Efficiently mining of skyline frequent-utility patterns

[...]

Jeng-Shyang Pan1, Jerry Chun-Wei Lin2, Lu Yang2, Philippe Fournier-Viger2, Tzung-Pei Hong3, Tzung-Pei Hong4 •
Fuzhou University1, Harbin Institute of Technology Shenzhen Graduate School2, National Sun Yat-sen University3, National University of Kaohsiung4
1 Jan 2017

24 citations

Journal Article•10.3233/IDA-150390•
Extracting domain-specific stopwords for text classifiers

[...]

Masoud Makrehchi1, Mohamed S. Kamel2•
University of Ontario Institute of Technology1, University of Waterloo2
1 Jan 2017

23 citations

Journal Article•10.3233/IDA-150479•
A social influence based trust model for recommender systems

[...]

Jian-Ping Mei1, Han Yu2, Zhiqi Shen2, Chunyan Miao2•
Zhejiang University of Technology1, Nanyang Technological University2
1 Jan 2017
TL;DR: A trustee-influence based trust model where a trustee’s activeness or trustworthiness is used to determine trust relationships is incorporated into a memory-based and matrix factorization recommender systems to support online purchasing decision-making.
Abstract: Trustworthy computing has recently attracted significant interest from researchers in several fields including multi-agent systems, social network analysis, and recommender systems. As an additional dimension of information to past rating history, trust has been shown to be helpful for improving the accuracy of recommendations. Studies on the relationship between trust and rating behaviors may provide insights into the formation of trust in the context of online community, and lead to possible indicators for the effective use of trust in recommendations. In this paper, we study people’s trust and rating behavior with the Epinions dataset. Epinions.com is a popular product review website allowing users to rate various categories of products, and establish a list of trustworthy users. We perform correlation analysis of activeness and trustworthiness defined by the number of ratings and the ∗Corresponding author. Tel.: (+86) 571 8529 0527. Email addresses: jpmei@zjut.edu.cn (Jian-Ping Mei), han.yu@ntu.edu.sg (Han Yu), zqshen@ntu.edu.sg (Zhiqi Shen), ascymiao@ntu.edu.sg (Chunyan Miao) Preprint submitted to Intelligent Data Analysis April 7, 2016 number of trustors to derive findings that can help the design of new decision support mechanisms in trust-based recommender systems. We then propose a trustee-influence based trust model where a trustee’s activeness or trustworthiness is used to determine trust relationships. This trust model is incorporated into a memory-based and matrix factorization recommender systems to support online purchasing decision-making. Experimental results demonstrate the effectiveness of the proposed trust model for recommendation.

20 citations

Journal Article•10.3233/IDA-160044•
A framework for detecting deviations in complex event logs

[...]

Guangming Li, Wil M. P. van der Aalst
19 Aug 2017
TL;DR: This paper proposes a novel approach that is faster than cluster-based approaches because it creates a so-called prole which is less time-consuming than creating clusters and more accurate than model-based approach because it uses an iterative approach to improve the result.
Abstract: Deviating behavior within an organization can lead to unexpected results. The effects of deviations are often negative, but sometimes also positive. Therefore, it is useful to detect deviations from event logs which record all the behavior of the organization. However, existing model-based and cluster-based approaches are inaccurate or slow when dealing with complex event logs, i.e. logs of less structured processes having many activities and many possible paths. This paper proposes a novel approach that is faster than cluster-based approaches because it creates a so-called profile which is less time-consuming than creating clusters. Furthermore, the approach is also more accurate than model-based approaches because we use an iterative approach to improve the result. Our experiments show that approach outperforms existing techniques in a variety of circumstances.

20 citations

Book Chapter•10.1007/978-3-319-68765-0_24•
Computational Topology Techniques for Characterizing Time-Series Data

[...]

Nicole F. Sanderson1, Elliott Shugerman1, Samantha Molnar1, James D. Meiss1, Elizabeth Bradley1 •
University of Colorado Boulder1
26 Oct 2017
TL;DR: Topological data analysis (TDA), while abstract, allows a characterization of time-series data obtained from nonlinear and complex dynamical systems and gives rise to the concept of persistent homology: how shape changes with scale.
Abstract: Topological data analysis (TDA), while abstract, allows a characterization of time-series data obtained from nonlinear and complex dynamical systems. Though it is surprising that such an abstract measure of structure—counting pieces and holes—could be useful for real-world data, TDA lets us compare different systems, and even do membership testing or change-point detection. However, TDA is computationally expensive and involves a number of free parameters. This complexity can be obviated by coarse-graining, using a construct called the witness complex. The parametric dependence gives rise to the concept of persistent homology: how shape changes with scale. Its results allow us to distinguish time-series data from different systems—e.g., the same note played on different musical instruments.

20 citations

Book Chapter•10.1007/978-3-319-68765-0_17•
Learning DTW-Preserving Shapelets

[...]

Arnaud Lods, Simon Malinowski, Romain Tavenard1, Laurent Amsaleg•
University of Rennes1
26 Oct 2017
TL;DR: This work focuses on learning, without class label information, shapelets such that Euclidean distances in the ST-space approximate well the true DTW, which leads to an ubiquitous representation of time series in a metric space, where any machine learning method (supervised or unsupervised) and indexing system can operate efficiently.
Abstract: Dynamic Time Warping (DTW) is one of the best similarity measures for time series, and it has extensively been used in retrieval, classification or mining applications. It is a costly measure, and applying it to numerous and/or very long times series is difficult in practice. Recently, Shapelet Transform (ST) proved to enable accurate supervised classification of time series. ST learns small subsequences that well discriminate classes, and transforms the time series into vectors lying in a metric space. In this paper, we adopt the ST framework in a novel way: we focus on learning, without class label information, shapelets such that Euclidean distances in the ST-space approximate well the true DTW. Our approach leads to an ubiquitous representation of time series in a metric space, where any machine learning method (supervised or unsupervised) and indexing system can operate efficiently.

19 citations

Journal Article•10.3233/IDA-170872•
Word co-occurrence augmented topic model in short text

[...]

Guan Bin Chen1, Hung-Yu Kao•
National Cheng Kung University1
1 Jan 2017
TL;DR: The authors proposed an improvement of word co-occurrence method to enhance the topic models and applied the word cooccurrence information to the BTM, and the experimental results show that the proposed methods are based on the original topic model that they did not need any external data and their proposed methods can easily apply to some other existing BTM based models.
Abstract: The large amount of text on the Internet cause people hard to understand the meaning in a short limit time. Topic models (e.g. LDA and PLSA) has been proposed to summarize the long text into several topic terms. In the recent years, the short text media such as tweet is very popular. However, directly applies the transitional topic model on the short text corpus usually gating non-coherent topics. Because there is no enough words to discover the word co-occurrence pattern in a short document. The Bi-term topic model (BTM) has been proposed to improve this problem. However, BTM just consider simple bi-term frequency which cause the generated topics are dominated by common words. In this paper, we solve the problem of the frequent bi-term in BTM. Thus, we proposed an improvement of word co-occurrence method to enhance the topic models. We apply the word co-occurrence information to the BTM. The experimental result that show our PMI-β-BTM gets well result in the both of regular short news title text and the noisy tweet text. Moreover, there are two advantages in our method. We do not need any external data and our proposed methods are based on the original topic model that we did not modify the model itself, thus our methods can easily apply to some other existing BTM based models.

19 citations

Book Chapter•10.1007/978-3-319-68765-0_6•
Seasonal Variation in Collective Mood via Twitter Content and Medical Purchases

[...]

Fabon Dzogang1, James Goulding2, Stafford L. Lightman1, Nello Cristianini1•
University of Bristol1, University of Nottingham2
26 Oct 2017
TL;DR: This study compares Twitter signals relative to anxiety, sadness, anger, and fatigue with purchase of items related to Anxiety, stress and fatigue at a major UK Health and Beauty retailer, and finds that all of these signals are highly correlated and strongly seasonal.
Abstract: The analysis of sentiment contained in vast amounts of Twitter messages has reliably shown seasonal patterns of variation in multiple studies, a finding that can have great importance in the understanding of seasonal affective disorders, particularly if related with known seasonal variations in certain hormones. An important question, however, is that of directly linking the signals coming from Twitter with other sources of evidence about average mood changes. Specifically we compare Twitter signals relative to anxiety, sadness, anger, and fatigue with purchase of items related to anxiety, stress and fatigue at a major UK Health and Beauty retailer. Results show that all of these signals are highly correlated and strongly seasonal, being under-expressed in the summer and over-expressed in the other seasons, with interesting differences and similarities across them. Anxiety signals, extracted from both Twitter and from Health product purchases, peak in spring and autumn, and correlate also with the purchase of stress remedies, while Twitter sadness has a peak in the Winter, along with Twitter anger and remedies for fatigue. Surprisingly, purchase of remedies for fatigue do not match the Twitter fatigue, suggesting that perhaps the names we give to these indicators are only approximate indications of what they actually measure. This study contributes both to the clarification of the mood signals contained in social media, and more generally to our understanding of seasonal cycles in collective mood.
Journal Article•10.3233/IDA-163131•
Possibilistic interest discovery from uncertain information in social networks

[...]

Mondher Sendi1, Mohamed Nazih Omri1, Mourad Abed2•
University of Sousse1, University of Valenciennes and Hainaut-Cambresis2
1 Jan 2017
TL;DR: A new approach for users’ interest discovery from uncertain information that augments traditional methods using possibilistic logic is proposed and the comparison with the most known methods proves the significance of this approach.
Abstract: User generated content on the microblogging social network Twitter continues to grow with significant amount of information. The semantic analysis offers the opportunity to discover and model latent interests’ in the users’ publications. This article focuses on the problem of uncertainty in the users’ publications that has not been previously treated. It proposes a new approach for users’ interest discovery from uncertain information that augments traditional methods using possibilistic logic. The possibility theory provides a solid theoretical base for the treatment of incomplete and imprecise information and inferring the reliable expressions from a knowledge base. More precisely, this approach used the product-based possibilistic network to model knowledge base and discovering possibilistic interests. DBpedia ontology is integrated into the interests’ discovery process for selecting the significant topics. The empirical analysis and the comparison with the most known methods proves the significance of this approach.
Journal Article•10.3233/IDA-170878•
Effective social content-based collaborative filtering for music recommendation

[...]

Ja-Hwung Su1, Wei-Yi Chang2, Vincent S. Tseng3•
Cheng Shiu University1, Center for Information Technology2, National Chiao Tung University3
1 Jan 2017
Journal Article•10.3233/IDA-163141•
Exploiting statistically significant dependent rules for associative classification

[...]

Jundong Li1, Osmar R. Zaïane2•
Arizona State University1, University of Alberta2
10 Oct 2017
TL;DR: This paper proposes a novel associative classifier, SigDirect, which uses Fisher’s exact test as a significance measure to directly mine classification association rules by some effective pruning strategies, and achieves better performance in terms of classification accuracy when measured with state-of-the-art rule based and associativeclassifiers.
Abstract: Established associative classification algorithms have shown to be very effective in handling categorical data such as text data. The learned model is a set of rules that are easy to understand and can be edited. However, they still suffer from the following limitations: first, they mostly use the support-confidence framework to mine classification association rules which require the setting of some confounding parameters; second, the lack of statistical dependency in the used framework may lead to the omission of many interesting rules and the detection of meaningless rules; third, the rule generation process usually generates a sheer number of rules which puts in question the interpretability and readability of the learned associative classification model. In this paper, we propose a novel associative classifier, SigDirect, to address the above problems. In particular, we use Fisher’s exact test as a significance measure to directly mine classification association rules by some effective pruning strategies. Without any threshold settings like minimum support and minimum confidence, SigDirect is able to find non-redundant classification association rules which express a statistically significant dependency between a set of antecedent items and a consequent class label. To further reduce the number of noisy rules, we present an instance-centric rule pruning strategy to find a subset of rules of high quality. At last, we propose and investigate various rule classification strategies to achieve a more accurate classification model. Experimental results on real-world datasets show that SigDirect achieves better performance in terms of classification accuracy when measured with state-of-the-art rule based and associative classifiers. Furthermore, the number of rules generated by SigDirect is orders of magnitude smaller than the number of rules found by other associative classifiers, which is very appealing in practice.
Book Chapter•10.1007/978-3-319-68765-0_28•
A Structural Benchmark for Logical Argumentation Frameworks

[...]

Bruno Yun1, Srdjan Vesic2, Madalina Croitoru1, Pierre Bisquert3, Rallou Thomopoulos3 •
University of Montpellier1, Artois University2, French Institute for Research in Computer Science and Automation3
26 Oct 2017
TL;DR: A practically-oriented benchmark suite for computational argumentation that instantiates abstract argumentation frameworks with existential rules, a language widely used in Semantic Web applications and provides a generator of such instantiated graphs is proposed.
Abstract: This paper proposes a practically-oriented benchmark suite for computational argumentation. We instantiate abstract argumentation frameworks with existential rules, a language widely used in Semantic Web applications and provide a generator of such instantiated graphs. We analyse performance of argumentation solvers on these benchmarks.
Journal Article•10.3233/IDA-170875•
Membrane computing inspired feature selection model for microarray cancer data

[...]

Naeimeh Elkhani1, Ravie Chandren Muniyandi•
National University of Malaysia1
1 Jan 2017
Journal Article•10.3233/IDA-170874•
Efficiently mining high utility sequential patterns in static and streaming data

[...]

Morteza Zihayat1, Cheng-Wei Wu2, Aijun An1, Vincent S. Tseng2, Chien Lin3 •
York University1, National Chiao Tung University2, Institute for Information Industry3
1 Jan 2017
TL;DR: HUSP-Stream is the first method to find HUSPs over data streams and a novel utility model called SequenceSuffix Utility is proposed for effectively pruning the search space in HUSP mining.
Abstract: High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come continuously and often at a high speed. To efficiently deal with both problems, we propose a novel framework for mining high utility sequential patterns over static and streaming databases. In this regard, two efficient data structures named ItemUtilLists (Item Utility Lists) and HUSP-Tree (High Utility Sequential Pattern Tree) are proposed to maintain essential information for mining HUSPs in both offline and online fashions. In addition, a novel utility model called SequenceSuffix Utility is proposed for effectively pruning the search space in HUSP mining. We propose an algorithm named HUSP-Miner (High Utility Sequential Pattern Miner) to find HUSPs in static databases efficiently. Then, a one-pass algorithm named HUSP-Stream (High Utility Sequential Pattern mining over Data Streams) is proposed to incrementally update ItemUtilLists and HUSP-Tree online and find HUSPs over data streams. To the best of our knowledge, HUSP-Stream is the first method to find HUSPs over data streams. Experimental results on both real and synthetic datasets show that HUSP-Miner outperforms the compared algorithms substantially in terms of execution time, memory usage and number of generated candidates. The experiments also demonstrate impressive performance of HUSPStream to update the data structures and discover HUSPs over data streams.
Journal Article•10.3233/IDA-160069•
Apriori and GUHA – Comparing two approaches to data mining with association rules

[...]

Jan Rauch1, Milan Šimůnek1•
University of Economics, Prague1
1 Jan 2017
Journal Article•10.3233/IDA-160034•
Financial distress prediction using SVM ensemble based on earnings manipulation and fuzzy integral

[...]

Chao Huang, Qingyu Yang, Mingwei Du, Donghui Yang
1 Jan 2017
Book Chapter•10.1007/978-3-319-68765-0_9•
Interactive Pattern Sampling for Characterizing Unlabeled Data

[...]

Arnaud Giacometti1, Arnaud Soulet1•
François Rabelais University1
26 Oct 2017
TL;DR: A new interactive pattern mining method that learns which part of the dataset is really interesting for the user by integrating user feedback about patterns, and aims at sampling patterns with a probability proportional to their frequency in the interesting transactions.
Abstract: Many data exploration tasks require a target class. Unfortunately, the data is not always labeled with respect to this desired class. Rather than using unsupervised methods or a labeling pre-processing, this paper proposes an interactive system that discovers this target class and characterizes it at the same time. More precisely, we introduce a new interactive pattern mining method that learns which part of the dataset is really interesting for the user. By integrating user feedback about patterns, our method aims at sampling patterns with a probability proportional to their frequency in the interesting transactions. We demonstrate that it accurately identifies the target class if user feedback is consistent. Experiments also show this method has a good true and false positive rate enabling to present relevant patterns to the user.
Journal Article•10.3233/IDA-150499•
Active seed selection for constrained clustering

[...]

Viet-Vu Vu1, Nicolas Labroche2•
Vietnam National University, Hanoi1, François Rabelais University2
1 Jan 2017
Journal Article•10.3233/IDA-160021•
Incorporating Wikipedia concepts and categories as prior knowledge into topic models

[...]

Kang Xu1, Guilin Qi1, Junheng Huang1, Tianxing Wu1•
Southeast University1
1 Jan 2017
Journal Article•10.3233/IDA-150489•
Fuzzy c-Least Medians clustering for discovery of web access patterns from web user sessions data

[...]

Zahid Ansari1, Ahmed Rimaz Faizabadi1, Asif Afzal1•
P A College of Engineering1
1 Jan 2017
Book Chapter•10.1007/978-3-319-68527-4_26•
Adaptive Signal Processing of Fetal PCG Recorded by Interferometric Sensor

[...]

Radek Martinek1, Radana Kahankova1, Jan Nedoma1, Marcel Fajkus1, Homer Nazeran2, Jana Nowaková1 •
Technical University of Ostrava1, University of Texas at El Paso2
9 Oct 2017
TL;DR: Adaptive methods based on Least Mean Square and Recursive Least Square algorithms are used for the elimination of the maternal component of the fetal phonocardiogram.
Abstract: This paper is focused on the design, implementation, and verification of an adaptive system for processing of the fetal phonocardiogram (fPCG) recorded by the novel interferometric sensor. The main interference to be suppressed in the abdominal signal is the maternal phonocardiogram (mPCG). In this article, adaptive methods based on Least Mean Square and Recursive Least Square algorithms are used for the elimination of the maternal component. Evaluation of the filtration quality is provided using the objective parameters (Signal Noise to Ratio, Sensitivity, and Positive Predictive Value).
Journal Article•10.3233/IDA-160031•
Instance-based classification with Ant Colony Optimization

[...]

Khalid M. Salama1, Ashraf M. Abdelbar2, Ayah Helal1, Alex A. Freitas1•
University of Kent1, Brandon University2
1 Jan 2017
TL;DR: This paper introduces a novel class-based feature weighting technique, in the context of instance-based distance methods, using the Ant Colony Optimization meta-heuristic, and proposes an ensemble of classifiers approach that makes use of the archived populations of the ACO?
Abstract: Instance-based learning (IBL) methods predict the class label of a new instance based directly on the distance between the new unlabeled instance and each labeled instance in the training set, without constructing a classification model in the training phase. In this paper, we introduce a novel class-based feature weighting technique, in the context of instance-based distance methods, using the Ant Colony Optimization meta-heuristic. We address three different approaches of instance-based classification: k-Nearest Neighbours, distance-based Nearest Neighbours, and Gaussian Kernel Estimator. We present a multi-archive adaptation of the ACO? algorithm and apply it to the optimization of the key parameter in each IBL algorithm and of the class-based feature weights. We also propose an ensemble of classifiers approach that makes use of the archived populations of the ACO? algorithm. We empirically evaluate the performance of our proposed algorithms on 36 benchmark datasets, and compare them with conventional instance-based classification algorithms, using various parameter settings, as well as with a state-of-the-art coevolutionary algorithm for instance selection and feature weighting for Nearest Neighbours classifiers.
Journal Article•10.3233/IDA-170882•
Deceptive text detection using continuous semantic space models

[...]

Ángel Hernández-Castañeda1, Hiram Calvo1•
Instituto Politécnico Nacional1
1 Jan 2017
Journal Article•10.3233/IDA-150316•
Scalable and practical One-Pass clustering algorithm for recommender system

[...]

Asra Khalid1, Mustansar Ali Ghazanfar1, Muhammad Awais Azam1, Yasmeen Fahad Aldhafiri, Sobia Zahra1 •
University of Engineering and Technology1
1 Jan 2017
TL;DR: A new clustering algorithm called One-Pass is proposed, which is a simple realtime algorithm that maintains a good level of accuracy, scale well with data, and build the training model incrementally with the arrival of new data.
Abstract: Recommender systems apply artificial intelligence techniques for filtering unseen information and predict whether a user would like/dislike a given item. K-Means clustering-based recommendation algorithms have been proposed claiming to increase the scalability of recommender systems. One potential drawback of these algorithms is that they perform training offline and hence cannot accommodate the incremental updates with the arrival of new data, making them unsuitable for the dynamic environments. From this line of research, a new clustering algorithm called One-Pass is proposed, which is a simple realtime algorithm that maintains a good level of accuracy, scale well with data, and build the training model incrementally with the arrival of new data. We run One-Pass algorithm on four different datasets (MovieLens, Film Trust, Book Crossing, and Last-FM) and empirically show that the proposed algorithm outperforms K-Means in terms of recommendation and training time. Moreover, One-Pass algorithm is comparable to K-Means in term of accuracy and cluster quality.
Journal Article•10.3233/IDA-163075•
Unsupervised active learning techniques for labeling training sets: An experimental evaluation on sequential data

[...]

Vinicius M. A. Souza1, Rafael G. Rossi2, Rafael G. Rossi1, Gustavo E. A. P. A. Batista1, Solange Oliveira Rezende1 •
Spanish National Research Council1, Federal University of Mato Grosso do Sul2
1 Jan 2017
Journal Article•10.3233/IDA-163098•
Cluster-Indistinguishability: A practical differential privacy mechanism for trajectory clustering

[...]

Hao Wang1, Zhengquan Xu1, Shan Jia1•
Wuhan University1
1 Jan 2017
TL;DR: This paper proposes a differential privacy preserving mechanism, Cluster-Indistinguishability, and derives the probability density function of two-dimensional Laplace noise, which satisfies the above definition.
Abstract: An important method of spatial-temporal data mining, trajectory clustering can mine valuable information in trajectories. However, cluster results without special sanitization pose serious threats to individual location privacy. Existing privacy preserving mechanisms for trajectory clustering still contend with the problems of narrow applicability, low-level utility, and difficulty in being applied to real scenarios. In this paper, we therefore propose a differential privacy preserving mechanism, Cluster-Indistinguishability, to support trajectory clustering. Firstly, a general model of typical trajectory clustering algorithms is given, and the definition of differential privacy is introduced according to the model. Then, we derive the probability density function of two-dimensional Laplace noise, which satisfies the above definition. Finally, we transform the noise from a Cartesian coordinate system to a Polar coordinate system to efficiently apply it in real scenarios. Experimental results show that Cluster-Indistinguishability has general applicability and better performance compared to existing methods.
Journal Article•10.3233/IDA-163020•
PGNBC: Pearson Gaussian Naïve Bayes classifier for data stream classification with recurring concept drift

[...]

D. Kishore Babu, Y. Ramadevi1, K.V. Ramana2•
Chaitanya Bharathi Institute of Technology1, Jawaharlal Nehru Technological University, Kakinada2
10 Oct 2017
TL;DR: The proposed PGNBC method is the advancement over the existing Guassian Naïve Bayes classifier (GNBC) by additionally adding the correlation among the attributes to improve the performance.
Abstract: In data stream classification, selecting the classifier for the dynamic feature space and considering the concept drift is a challenging task. This paper addresses the major challenges in the data stream classification with recurring concept drift. We developed a novel classification method known as Pearson Guassian Naïve Bayes classification (PGNBC). The proposed PGNBC method is the advancement over the existing Guassian Naïve Bayes classifier (GNBC) by additionally adding the correlation among the attributes. For the data stream classification, the proposed PGNBC is frequently updated based on the concept drift. This newly developed method is experimented by comparing the results with the existing methods such as RGNBC and MReC-DFS. The metrics such as sensitivity, specificity and accuracy are used for measuring the performance. It is found that the improvement in terms of sensitivity, specificity and accuracy values are better for the proposed method, with the values of 4%, 1% and 1% respectively, which is higher for the PGNBC method than the RGNBC method for the skin data. But with the localization data, the improvement in terms of specificity and accuracy values are 6% and 2% respectively which is higher than the RGNBC.
...

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve