A k-norm pruning algorithm for decision tree classifiers based on error rate estimation

doi:10.1007/S10994-007-5044-4

Open AccessJournal Article10.1007/S10994-007-5044-4

A k-norm pruning algorithm for decision tree classifiers based on error rate estimation

Mingyu Zhong, +2 more

- 01 Apr 2008

- Machine Learning

- Vol. 71, Iss: 1, pp 55-88

21

TL;DR: This work applies Lidstone’s Law of Succession for the estimation of the class probabilities and error rates of decision tree classifiers, and proposes an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set.

Abstract: Decision trees are well-known and established models for classification and regression. In this paper, we focus on the estimation and the minimization of the misclassification rate of decision tree classifiers. We apply Lidstone's Law of Succession for the estimation of the class probabilities and error rates. In our work, we take into account not only the expected values of the error rate, which has been the norm in existing research, but also the corresponding reliability (measured by standard deviations) of the error rate. Based on this estimation, we propose an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly, and compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.18637/JSS.V047.I10

Classification Trees for Ordinal Responses in R: The rpartScore Package

Giuliano Galimberti, +2 more

- 17 May 2012

- Journal of Statistical Software

TL;DR: Galimberti, Soffritti, and Di Maso as discussed by the authors introduced rpartScore, a new R package for building classification trees for ordinal responses, that can be employed whenever a set of scores is assigned to the ordered categories of the response.

...read moreread less

70

Journal Article•10.1016/j.ins.2022.08.066

VPGB: A granular-ball based model for attribute reduction and classification with label noise

Xiaoli Peng, +4 more

- 01 Sep 2022

- Information Sciences

TL;DR: In this paper , a robust variable parameter granular-ball model (VPGB) is proposed to achieve both attribute reduction and classification in a label noise environment from a coarse granularity perspective.

...read moreread less

25

Journal Article•10.1080/00224065.2002.11980137

Basic Engineering Data Collection and Analysis

Lloyd S. Nelson

- 01 Jan 2002

- Journal of Quality Technology

19

Journal Article•10.1016/J.PATCOG.2013.07.021

Integrated Fisher linear discriminants

Gao Daqi, +2 more

- 01 Feb 2014

- Pattern Recognition

TL;DR: The extensive experimental results over real-world datasets have demonstrated that the integrated FLDs have obvious advantages over the conventional FLDs in the aspects of learning and generalization performances for the imbalanced datasets.

...read moreread less

14

•Dissertation

Simple low cost causal discovery using mutual information and domain knowledge

Adrian Joseph

- 01 Jan 2011

TL;DR: This paper looks at the performance of an expert constructed BN compared with other machine learning (ML) techniques for predicting the outcome (win, lose, or draw) of matches played by Tottenham Hotspur Football Club.

...read moreread less

13

...

Expand

References

Lecture Notes in Computer Science 2382

Petrus Bollen

- 01 Jan 2002

36.7K

•Book

C4.5: Programs for Machine Learning

J. Ross Quinlan

- 15 Oct 1992

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.

...read moreread less

27.2K

•Book

Classification and regression trees

Leo Breiman

- 01 Jan 1983

TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.

...read moreread less

22.7K

Journal Article•10.2307/2288003

Classification and Regression Trees.

John Van Ryzin, +4 more

- 01 Mar 1986

- Journal of the American Statistical Asso...

21.8K

UCI Repository of machine learning databases

Catherine Blake

- 01 Jan 1998

14.1K

...

Expand

A k-norm pruning algorithm for decision tree classifiers based on error rate estimation

Chat with Paper

AI Agents for this Paper

Citations

Classification Trees for Ordinal Responses in R: The rpartScore Package

VPGB: A granular-ball based model for attribute reduction and classification with label noise

Basic Engineering Data Collection and Analysis

Integrated Fisher linear discriminants

Simple low cost causal discovery using mutual information and domain knowledge

References

Lecture Notes in Computer Science 2382

C4.5: Programs for Machine Learning

Classification and regression trees

Classification and Regression Trees.

UCI Repository of machine learning databases

Related Papers (5)

A comparative analysis of methods for pruning decision trees

Classification and regression trees

An efficient algorithm for optimal pruning of decision trees

A Backward Adjusting Strategy and Optimization of the C4.5 Parameters to Improve C4.5's Performance

Decision tree improvement algorithm and its application