Journal Article10.1007/S11390-007-9084-9
A semi-random multiple decision-tree algorithm for mining data streams
16
TL;DR: An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper and has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.
read more
Abstract: Mining with streaming data is a hot topic in data mining. When performing classification on data streams, traditional classification algorithms based on decision trees, such as ID3 and C4.5, have a relatively poor efficiency in both time and space due to the characteristics of streaming data. There are some advantages in time and space when using random decision trees. An incremental algorithm for mining data streams, SRMTDS (Semi-Random Multiple decision Trees for Data Streams), based on random decision trees is proposed in this paper. SRMTDS uses the inequality of Hoeffding bounds to choose the minimum number of split-examples, a heuristic method to compute the information gain for obtaining the split thresholds of numerical attributes, and a Naive Bayes classifier to estimate the class labels of tree leaves. Our extensive experimental study shows that SRMTDS has an improved performance in time, space, accuracy and the anti-noise capability in comparison with VFDTc, a state-of-the-art decision-tree algorithm for classifying data streams.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Learning concept-drifting data streams with random ensemble decision trees
TL;DR: An incremental algorithm based on Ensemble Decision Trees for Concept-drifting data streams (EDTC) performs very well compared to several known online algorithms based on single models and ensemble models and concludes that multiple solutions are provided for learning from concept drifting data streams under noise.
62
Stream mining: a novel architecture for ensemble-based classification
Valerio Grossi,Franco Turini +1 more
TL;DR: The paper outlines novel data structures and algorithms to tackle the above problem, when the model mined out of the data is a classifier, and the introduced model and the overall ensemble architecture are presented in details.
26
Research on time series data mining algorithm based on Bayesian node incremental decision tree
TL;DR: The experimental results show that in both the incremental and non-incremental time series data mining, the incremental decision tree algorithm based on Bayesian nodes optimization which can improve the classification accuracy.
23
A random decision tree ensemble for mining concept drifts from noisy data streams
TL;DR: This article presents a new light-weighted inductive algorithm for concept drifting detection in virtue of an ensemble model of random decision trees (named CDRDT) to distinguish various types of concept drifts from noisy data streams in this article.
22
Mining Concept-Drifting Data Streams with Multiple Semi-Random Decision Trees
Peipei Li,Xuegang Hu,Xindong Wu +2 more
- 08 Oct 2008
TL;DR: An incremental algorithm with Multiple Semi- Random decision Trees (MSRT) for concept-drifting data streams is presented, which takes two sliding windows for training and testing, uses the inequality of Hoeffding Bounds to determine the thresholds for distinguishing the true drift from noise, and chooses the classification function to estimate the error rate for periodic concept- drift detection.
19
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
•Book
Software Engineering: A Practitioner's Approach
Roger S. Pressman
- 01 Jan 1982
TL;DR: Software Engineering A Practitioner's Approach recognizes the dramatic growth in the field of software engineering and emphasizes new and important methods and tools used in the industry.
10.4K
Probability Inequalities for sums of Bounded Random Variables
TL;DR: In this article, upper bounds for the probability that the sum S of n independent random variables exceeds its mean ES by a positive number nt are derived for certain sums of dependent random variables such as U statistics.
Ensemble Methods in Machine Learning
Thomas G. Dietterich
- 21 Jun 2000
TL;DR: Some previous studies comparing ensemble methods are reviewed, and some new experiments are presented to uncover the reasons that Adaboost does not overfit rapidly.
Related Papers (5)
W. Nick Street,Yong Seog Kim +1 more
- 26 Aug 2001
Geoff Hulten,Laurie Spencer,Pedro Domingos +2 more
- 26 Aug 2001