Journal Article10.1002/WIDM.44
Evolutionary computation for training set selection
22
TL;DR: The main algorithms that have been developed for decision trees, artificial neural networks, and other classifiers are presented and the relevant issue of the scalability of these methods to very large datasets is discussed.
read more
Abstract: Instance selection is becoming increasingly relevant because of the large amount of data that is constantly being produced in many fields of research. Two basic approaches exist for instance selection: instance selection as a method for prototype selection for instance-based methods (such as k-nearest neighbors) and instance selection for obtaining the training set for classifiers that require a learning process (such as decision trees or neural networks). In this paper, we review the methods that have been developed thus far for the latter approach within the field of evolutionary computation. Different groups of learning algorithms require different instance selectors to suit their learning/search biases. This requirement may render many instance selection algorithms useless if their philosophy of design is not suitable for the problem at hand. Evolutionary algorithms do not assume any structure of the data or any behavior of the classifier but instead adapt the instance selection to the performance of the classifier. They are therefore very suitable for training set selection. The main algorithms that have been developed for decision trees, artificial neural networks, and other classifiers are presented. We also discuss the relevant issue of the scalability of these methods to very large datasets. Although current algorithms are useful for fairly large datasets, scaling problems are found when the number of instances is in the hundreds of thousands or millions. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 512–523 DOI: 10.1002/widm.44
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A balanced approach to the multi-class imbalance problem
Lawrence Mosley
- 01 Jan 2013
TL;DR: A chronology of key events and events leading to the creation of the modern-dayiphate is described.
110
A survey on pre-processing techniques: Relevant issues in the context of environmental data mining
TL;DR: A survey on most popular pre-processing steps required in environmental data analysis is presented, together with a proposal to systematize it, and general ideas to a non-expert user, who, after reading them, can decide which one is the more suitable technique required to solve his/her problem.
A scalable memetic algorithm for simultaneous instance and feature selection
TL;DR: A new memetic algorithm for dealing with many instances and many features simultaneously by performing joint instance and feature selection is proposed and an extension of the stratification approach is developed for simultaneous instance andfeature selection.
41
A study on the application of instance selection techniques in genetic fuzzy rule-based classification systems: Accuracy-complexity trade-off
TL;DR: It is shown that some of these methods can considerably help to reduce the computational time of the evolutionary process and to decrease the complexity of the fuzzy rule-based models with a very limited decrease of their accuracy with respect to the models generated by using the overall training set.
34
On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection
Joaquín Derrac,Nele Verbiest,Salvador García,Chris Cornelis,Francisco Herrera +4 more
- 01 Feb 2013
TL;DR: A fuzzy rough set method for prototype selection, focused on optimizing the behavior of this classifier, results show that the new hybrid approach obtains very promising results with respect to classification accuracy and reduction of the size of the training set.
32
References
Genetic algorithms in search, optimization and machine learning
David E. Goldberg
- 01 Jan 1989
TL;DR: This book brings together the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields.
58.6K
•Book
Genetic algorithms in search, optimization, and machine learning
David E. Goldberg
- 01 Sep 1988
TL;DR: In this article, the authors present the computer techniques, mathematical tools, and research results that will enable both students and practitioners to apply genetic algorithms to problems in many fields, including computer programming and mathematics.
•Book
Adaptation in natural and artificial systems
John H. Holland
- 01 Jan 1975
TL;DR: Names of founding work in the area of Adaptation and modiication, which aims to mimic biological optimization, and some (Non-GA) branches of AI.
•Book
A mathematical theory of evidence
Glenn Shafer
- 01 Jan 1976
TL;DR: This book develops an alternative to the additive set functions and the rule of conditioning of the Bayesian theory: set functions that need only be what Choquet called "monotone of order of infinity." and Dempster's rule for combining such set functions.
14.6K
•Book
Genetic Algorithms + Data Structures = Evolution Programs
Zbigniew Michalewicz
- 01 Jan 1992
TL;DR: GAs and Evolution Programs for Various Discrete Problems, a Hierarchy of Evolution Programs and Heuristics, and Conclusions.
13.5K