Boosting classifiers for drifting concepts
Martin Scholz,Ralf Klinkenberg +1 more
- 01 Jan 2007
- Vol. 11, Iss: 1, pp 3-28
141
TL;DR: In this paper, a boosting-like method is proposed to train a classifier ensemble from data streams that naturally adapts to concept drift by continuously re-weighting the ensemble members based on their performance on the most recent examples.
read more
Abstract: In many real-world classification tasks, data arrives over time and the target concept to be learned from the data stream may change over time. Boosting methods are well-suited for learning from data streams, but do not address this concept drift problem. This paper proposes a boosting-like method to train a classifier ensemble from data streams that naturally adapts to concept drift. Moreover, it allows to quantify the drift in terms of its base learners. Similar as in regular boosting, examples are re-weighted to induce a diverse ensemble of base models. In order to handle drift, the proposed method continuously re-weights the ensemble members based on their performance on the most recent examples only. The proposed strategy adapts quickly to different kinds of concept drift. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. The proposed algorithm has low computational costs.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A survey on concept drift adaptation
TL;DR: The survey covers the different facets of concept drift in an integrated way to reflect on the existing scattered state of the art and aims at providing a comprehensive introduction to the concept drift adaptation for researchers, industry analysts, and practitioners.
YALE: rapid prototyping for complex data mining tasks
Ingo Mierswa,Michael Wurst,Ralf Klinkenberg,Martin Scholz,Timm Euler +4 more
- 20 Aug 2006
TL;DR: Yale is described, a free open-source environment for KDD and machine learning which provides a rich variety of methods which allows rapid prototyping for new applications and makes costlyre-implementations unnecessary and offers extensive functionality for process evaluation and optimization.
1.2K
A survey of methods for time series change point detection
TL;DR: This survey article enumerates, categorizes, and compares many of the methods that have been proposed to detect change points in time series, and presents some grand challenges for the community to consider.
1.1K
Ensemble learning for data stream analysis
TL;DR: This paper surveys research on ensembles for data stream classification as well as regression tasks and discusses advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs.
1K
Incremental Learning of Concept Drift in Nonstationary Environments
Ryan Elwell,Robi Polikar +1 more
TL;DR: An ensemble of classifiers-based approach for incremental learning of concept drift, characterized by nonstationary environments (NSEs), where the underlying data distributions change over time, which indicates that Learn++.NSE can track the changing environments very closely, regardless of the type of concept Drift.
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Yoav Freund,Robert E. Schapire +1 more
- 01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Bagging predictors
Leo Breiman
- 01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Term Weighting Approaches in Automatic Text Retrieval
Gerard Salton,Chris Buckley +1 more
TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.