A k-norm pruning algorithm for decision tree classifiers based on error rate estimation
TL;DR: This work applies Lidstone’s Law of Succession for the estimation of the class probabilities and error rates of decision tree classifiers, and proposes an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set.
read more
Abstract: Decision trees are well-known and established models for classification and regression. In this paper, we focus on the estimation and the minimization of the misclassification rate of decision tree classifiers. We apply Lidstone's Law of Succession for the estimation of the class probabilities and error rates. In our work, we take into account not only the expected values of the error rate, which has been the norm in existing research, but also the corresponding reliability (measured by standard deviations) of the error rate. Based on this estimation, we propose an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly, and compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Classification Trees for Ordinal Responses in R: The rpartScore Package
TL;DR: Galimberti, Soffritti, and Di Maso as discussed by the authors introduced rpartScore, a new R package for building classification trees for ordinal responses, that can be employed whenever a set of scores is assigned to the ordered categories of the response.
70
VPGB: A granular-ball based model for attribute reduction and classification with label noise
Xiaoli Peng,Pinghua Wang,Shuyin Xia,Cheng Wang,Weiqian Chen +4 more
TL;DR: In this paper , a robust variable parameter granular-ball model (VPGB) is proposed to achieve both attribute reduction and classification in a label noise environment from a coarse granularity perspective.
25
Integrated Fisher linear discriminants
Gao Daqi,Ding Jun,Zhu Changming +2 more
TL;DR: The extensive experimental results over real-world datasets have demonstrated that the integrated FLDs have obvious advantages over the conventional FLDs in the aspects of learning and generalization performances for the imbalanced datasets.
14
•Dissertation
Simple low cost causal discovery using mutual information and domain knowledge
Adrian Joseph
- 01 Jan 2011
TL;DR: This paper looks at the performance of an expert constructed BN compared with other machine learning (ML) techniques for predicting the outcome (win, lose, or draw) of matches played by Tottenham Hotspur Football Club.
13
References
•Book
C4.5: Programs for Machine Learning
J. Ross Quinlan
- 15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
27.2K
•Book
Classification and regression trees
Leo Breiman
- 01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
22.7K