Open AccessProceedings Article
Feature Selection for Improving Case-Based Classifiers on High-Dimensional Data Sets.
Niloofar Arshadi,Igor Jurisica +1 more
- 01 Jan 2005
pp 99-104
TL;DR: It is shown that using logistic regression as a filter FS method outperforms other FS techniques, such as Fisher and t-test, which have been widely used in analyzing biological data sets.
read more
Abstract: Case-based reasoning (CBR) is a suitable paradigm for class discovery in molecular biology, where the rules that define the domain knowledge are difficult to obtain, and there is not sufficient knowledge for formal knowledge representation. To extend the capabilities of this paradigm, we propose logistic regression for CBR (LR4CBR), a method that uses logistic regression as a feature selection (FS) method for CBR systems. Our method not only improves the prediction accuracy of CBR classifiers in biomedical domains, but also selects a subset of features that have meaningful relationships with their class labels. In this paper, we introduce two methods to rank features for logistic regression. We show that using logistic regression as a filter FS method outperforms other FS techniques, such as Fisher and t-test, which have been widely used in analyzing biological data sets. The FS methods are combined with a computational framework for a CBR system called TA3 . We also evaluate the method on two mass spectrometry data sets, and show that the prediction accuracy of TA3 improves from 90% to 98% and from 79.2% to 95.4%. Finally, we compare our list of discovered biomarkers with the lists of selected biomarkers from other studies for the mass spectrometry data sets, and show the overlapping biomarkers.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Introduction to Case-Based Reasoning for Signals and Images
Petra Perner
- 01 Jan 2008
TL;DR: The basics of CBR are described, what has been done so far in the field of signal-interpreting systems are reviewed, and new strategies are necessary to satisfy changing environmental conditions, user needs, and process requirements.
29
Combining Supervised Learning Techniques to Key-Phrase Extraction for Biomedical Full-Text
TL;DR: SVMreg-1 performs best in key-phrase extraction for full-text, whereas Naive Bayes performs best for abstracts, and these techniques should be considered for use in information system search functionality.
18
Flexible Feature Deletion: Compacting Case Bases by Selectively Compressing Case Contents
David B. Leake,Brian Schack +1 more
- 28 Sep 2015
TL;DR: Experimental results support that when cases have varying size and compressible contents, flexible feature deletion strategies may enable better system performance than case-oriented strategies for the same level of compression.
13
An ensemble of case-based classifiers for high-dimensional biological domains
Niloofar Arshadi,Igor Jurisica +1 more
- 23 Aug 2005
TL;DR: The mixture of experts for case-based reasoning (MOE4CBR), where clustering techniques are applied to cluster the case-base into k groups, and each cluster is used as a case- base for the authors' k CBR classifiers, improves the classification accuracy of TA3 and is evaluated on two publicly available data sets on mass-to-charge intensities.
7
References
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
The Elements of Statistical Learning
TL;DR: Chapter 11 includes more case studies in other areas, ranging from manufacturing to marketing research, and a detailed comparison with other diagnostic tools, such as logistic regression and tree-based methods.
15.5K
Machine learning
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Wrappers for feature subset selection
Ron Kohavi,George H. John +1 more
TL;DR: The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain and compares the wrapper approach to induction without feature subset selection and to Relief, a filter approach tofeature subset selection.
9.6K