A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification
TL;DR: Both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.
read more
Abstract: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
FiGS: a filter-based gene selection workbench for microarray data
TL;DR: FiGS is an web-based application that automates an extensive search for the optimized gene selection analysis for a microarray dataset in a parallel computing environment and will provide both an efficient and comprehensive means of acquiring optimal gene sets that discriminate disease states from microarray datasets.
3D object classification in baggage computed tomography imagery using randomised clustering forests
Andre Mouton,Toby P. Breckon,Greg T. Flitton,Najla Megherbi +3 more
- 30 Oct 2014
TL;DR: An improvement over the state-of-the-art both in terms of accuracy as well as processing time is demonstrated using a codebook constructed via randomised clustering forests, a dense feature sampling strategy and an SVM classifier.
Informatics for Metabolomics
TL;DR: Overall metabolomics studies from pre- to post-metabolomics era and their impact on society are introduced and useful examples of techniques, tools, and databases for metabolomics data analysis are shown starting from preprocessing toward functional interpretation.
31
Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers.
TL;DR: The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissue-specific tumor development.
Potential of Artificial Intelligence-Based Techniques for Rainfall Forecasting in Thailand: A Comprehensive Review
TL;DR: The investigation concludes that hybrid models combining ANNs with wavelet transformation and bootstrapping can improve the current accuracy of rainfall forecasting in Thailand.
31
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
•Book
The Elements of Statistical Learning
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 01 Jan 2001
29.4K
•Book
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Trevor Hastie,Robert Tibshirani,Jerome H. Friedman +2 more
- 28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
21.3K
Classification and Regression by randomForest
Andy Liaw,Matthew C. Wiener +1 more
- 01 Jan 2007
TL;DR: random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.
Related Papers (5)
[...]
Leo Breiman
- 01 Oct 2001
Corinna Cortes,Vladimir Vapnik +1 more
Andy Liaw,Matthew C. Wiener +1 more
- 01 Jan 2007
[...]
Leo Breiman
- 01 Aug 1996