Computational Intelligence and Data Mining

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1109/CIDM.2009.4938667•

Diversity analysis on imbalanced data sets by using ensemble models

[...]

Shuo Wang¹, Xin Yao¹•Institutions (1)

University of Birmingham¹

15 May 2009

TL;DR: This paper explores the impact of diversity on each class and overall performance and improves SMOTE in a novel way for solving multi-class data sets in ensemble model - SMOTEBagging.

...read moreread less

Abstract: Many real-world applications have problems when learning from imbalanced data sets, such as medical diagnosis, fraud detection, and text classification Very few minority class instances cannot provide sufficient information and result in performance degrading greatly As a good way to improve the classification performance of weak learner, some ensemble-based algorithms have been proposed to solve class imbalance problem However, it is still not clear that how diversity affects classification performance especially on minority classes, since diversity is one influential factor of ensemble This paper explores the impact of diversity on each class and overall performance As the other influential factor, accuracy is also discussed because of the trade-off between diversity and accuracy Firstly, three popular re-sampling methods are combined into our ensemble model and evaluated for diversity analysis, which includes under-sampling, over-sampling, and SMOTE [1] - a data generation algorithm Secondly, we experiment not only on two-class tasks, but also those with multiple classes Thirdly, we improve SMOTE in a novel way for solving multi-class data sets in ensemble model - SMOTEBagging

...read moreread less

597 citations

Proceedings Article•10.1109/CIDM.2011.5949453•

Flexible Heuristics Miner (FHM)

[...]

A.J.M.M. Weijters¹, Joel Ribeiro¹•Institutions (1)

Eindhoven University of Technology¹

11 Apr 2011

TL;DR: A new process representation language is presented in combination with an accompanying process mining algorithm that results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise.

...read moreread less

Abstract: One of the aims of process mining is to retrieve a process model from a given event log. However, current techniques have problems when mining processes that contain nontrivial constructs, processes that are low structured and/or dealing with the presence of noise in the event logs. To overcome these problems, a new process representation language is presented in combination with an accompanying process mining algorithm. The most significant property of the new representation language is in the way the semantics of splits and joins are represented; by using so-called split/join frequency tables. This results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise. This paper explains the new process representation language and how the mining algorithm works. The algorithm is implemented as a plug-in in the ProM framework. An illustrative example with noise and a real life log of a complex and low structured process are used to explicate the presented approach.

...read moreread less

552 citations

Proceedings Article•10.1109/CIDM.2009.4938676•

Regularized Extreme Learning Machine

[...]

Wan-Yu Deng¹, Qinghua Zheng¹, Lin Chen•Institutions (1)

Xi'an Jiaotong University¹

15 May 2009

TL;DR: A novel algorithm called Regularized Extreme Learning Machine is proposed, based on structural risk minimization principle and weighted least square, which was improved significantly in most cases without increasing training time.

...read moreread less

Abstract: Extreme Learning Machine proposed by Huang G-B has attracted many attentions for its extremely fast training speed and good generalization performance. But it still can be considered as empirical risk minimization theme and tends to generate over-fitting model. Additionally, since ELM doesn't considering heteroskedasticity in real applications, its performance will be affected seriously when outliers exist in the dataset. In order to address these drawbacks, we propose a novel algorithm called Regularized Extreme Learning Machine based on structural risk minimization principle and weighted least square. The generalization performance of the proposed algorithm was improved significantly in most cases without increasing training time.

...read moreread less

508 citations

Proceedings Article•10.1109/CIDM.2007.368917•

Incremental Local Outlier Detection for Data Streams

[...]

Dragoljub Pokrajac¹, Aleksandar Lazarevic, Longin Jan Latecki²•Institutions (2)

Delaware State University¹, Temple University²

4 Jun 2007

TL;DR: The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points in the data set.

...read moreread less

Abstract: Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (local outlier factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points TV in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications

...read moreread less

460 citations

Proceedings Article•10.1109/CIDM.2011.5949434•

Local neighbourhood extension of SMOTE for mining imbalanced data

[...]

Tomasz Maciejewski¹, Jerzy Stefanowski¹•Institutions (1)

Poznań University of Technology¹

11 Apr 2011

TL;DR: A new generalization of SMOTE is introduced, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples of the majority class, and improves evaluation measures for the minority class.

...read moreread less

Abstract: In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.

...read moreread less

303 citations

...

Expand

Year	Papers
2019	1
2017	1
2015	6
2014	69
2013	44
2011	48

Conference Tools

Papers published on a yearly basis

Papers

Diversity analysis on imbalanced data sets by using ensemble models

Flexible Heuristics Miner (FHM)

Regularized Extreme Learning Machine

Incremental Local Outlier Detection for Data Streams

Local neighbourhood extension of SMOTE for mining imbalanced data

Performance Metrics