Conference
Computational Intelligence and Data Mining
About: Computational Intelligence and Data Mining is an academic conference. The conference publishes majorly in the area(s): Cluster analysis & Computer science. Over the lifetime, 348 publications have been published by the conference receiving 6990 citations.
Topics: Cluster analysis, Computer science, Support vector machine, Process mining, Statistical classification
Papers
15 May 2009
TL;DR: This paper explores the impact of diversity on each class and overall performance and improves SMOTE in a novel way for solving multi-class data sets in ensemble model - SMOTEBagging.
Abstract: Many real-world applications have problems when learning from imbalanced data sets, such as medical diagnosis, fraud detection, and text classification Very few minority class instances cannot provide sufficient information and result in performance degrading greatly As a good way to improve the classification performance of weak learner, some ensemble-based algorithms have been proposed to solve class imbalance problem However, it is still not clear that how diversity affects classification performance especially on minority classes, since diversity is one influential factor of ensemble This paper explores the impact of diversity on each class and overall performance As the other influential factor, accuracy is also discussed because of the trade-off between diversity and accuracy Firstly, three popular re-sampling methods are combined into our ensemble model and evaluated for diversity analysis, which includes under-sampling, over-sampling, and SMOTE [1] - a data generation algorithm Secondly, we experiment not only on two-class tasks, but also those with multiple classes Thirdly, we improve SMOTE in a novel way for solving multi-class data sets in ensemble model - SMOTEBagging
597 citations
11 Apr 2011
TL;DR: A new process representation language is presented in combination with an accompanying process mining algorithm that results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise.
Abstract: One of the aims of process mining is to retrieve a process model from a given event log. However, current techniques have problems when mining processes that contain nontrivial constructs, processes that are low structured and/or dealing with the presence of noise in the event logs. To overcome these problems, a new process representation language is presented in combination with an accompanying process mining algorithm. The most significant property of the new representation language is in the way the semantics of splits and joins are represented; by using so-called split/join frequency tables. This results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise. This paper explains the new process representation language and how the mining algorithm works. The algorithm is implemented as a plug-in in the ProM framework. An illustrative example with noise and a real life log of a complex and low structured process are used to explicate the presented approach.
552 citations
15 May 2009
TL;DR: A novel algorithm called Regularized Extreme Learning Machine is proposed, based on structural risk minimization principle and weighted least square, which was improved significantly in most cases without increasing training time.
Abstract: Extreme Learning Machine proposed by Huang G-B has attracted many attentions for its extremely fast training speed and good generalization performance. But it still can be considered as empirical risk minimization theme and tends to generate over-fitting model. Additionally, since ELM doesn't considering heteroskedasticity in real applications, its performance will be affected seriously when outliers exist in the dataset. In order to address these drawbacks, we propose a novel algorithm called Regularized Extreme Learning Machine based on structural risk minimization principle and weighted least square. The generalization performance of the proposed algorithm was improved significantly in most cases without increasing training time.
508 citations
4 Jun 2007
TL;DR: The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points in the data set.
Abstract: Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (local outlier factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points TV in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications
460 citations
11 Apr 2011
TL;DR: A new generalization of SMOTE is introduced, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples of the majority class, and improves evaluation measures for the minority class.
Abstract: In this paper we discuss problems of inducing classifiers from imbalanced data and improving recognition of minority class using focused resampling techniques. We are particularly interested in SMOTE over-sampling method that generates new synthetic examples from the minority class between the closest neighbours from this class. However, SMOTE could also overgeneralize the minority class region as it does not consider distribution of other neighbours from the majority classes. Therefore, we introduce a new generalization of SMOTE, called LN-SMOTE, which exploits more precisely information about the local neighbourhood of the considered examples. In the experiments we compare this method with original SMOTE and its two, the most related, other generalizations Borderline and Safe-Level SMOTE. All these pre-processing methods are applied together with either decision tree or Naive Bayes classifiers. The results show that the new LN-SMOTE method improves evaluation measures for the minority class.
303 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2019 | 1 |
| 2017 | 1 |
| 2015 | 6 |
| 2014 | 69 |
| 2013 | 44 |
| 2011 | 48 |