A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

doi:10.1186/1471-2105-9-319

Open AccessJournal Article10.1186/1471-2105-9-319

A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification

Alexander Statnikov, +2 more

- 22 Jul 2008

- BMC Bioinformatics

- Vol. 9, Iss: 1, pp 319-319

722

TL;DR: Both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

Abstract: Cancer diagnosis and clinical outcome prediction are among the most important emerging applications of gene expression microarray technology with several molecular signatures on their way toward clinical deployment. Use of the most accurate classification algorithms available for microarray gene expression data is a critical ingredient in order to develop the best possible molecular signatures for patient care. As suggested by a large body of literature to date, support vector machines can be considered "best of class" algorithms for classification of such data. Recent work, however, suggests that random forest classifiers may outperform support vector machines in this domain. In the present paper we identify methodological biases of prior work comparing random forests and support vector machines and conduct a new rigorous evaluation of the two algorithms that corrects these limitations. Our experiments use 22 diagnostic and prognostic datasets and show that support vector machines outperform random forests, often by a large margin. Our data also underlines the importance of sound research design in benchmarking and comparison of bioinformatics algorithms. We found that both on average and in the majority of microarray datasets, random forests are outperformed by support vector machines both in the settings when no gene selection is performed and when several popular gene selection methods are used.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1186/1471-2105-11-50

FiGS: a filter-based gene selection workbench for microarray data

Taeho Hwang, +3 more

- 26 Jan 2010

- BMC Bioinformatics

TL;DR: FiGS is an web-based application that automates an extensive search for the optimized gene selection analysis for a microarray dataset in a parallel computing environment and will provide both an efficient and comprehensive means of acquiring optimal gene sets that discriminate disease states from microarray datasets.

...read moreread less

31

•Proceedings Article•10.1109/ICIP.2014.7026053

3D object classification in baggage computed tomography imagery using randomised clustering forests

Andre Mouton, +3 more

- 30 Oct 2014

TL;DR: An improvement over the state-of-the-art both in terms of accuracy as well as processing time is demonstrated using a codebook constructed via randomised clustering forests, a dense feature sampling strategy and an SVM classifier.

...read moreread less

31

Book Chapter•10.1007/978-981-10-1503-8_5

Informatics for Metabolomics

Kanthida Kusonmano, +2 more

- 01 Jan 2016

- Advances in Experimental Medicine and Bi...

TL;DR: Overall metabolomics studies from pre- to post-metabolomics era and their impact on society are introduced and useful examples of techniques, tools, and databases for metabolomics data analysis are shown starting from preprocessing toward functional interpretation.

...read moreread less

31

•Journal Article•10.18632/ONCOTARGET.21127

Identification of potential tissue-specific cancer biomarkers and development of cancer versus normal genomic classifiers.

Akram Mohammed, +3 more

- 21 Sep 2017

- Oncotarget

TL;DR: The machine learning classifiers developed in this study identify potential cancer biomarkers with sensitivity and specificity that exceed those of existing biomarkers and pointed to pathways that are critical to tissue-specific tumor development.

...read moreread less

31

Journal Article•10.3390/w15162979

Potential of Artificial Intelligence-Based Techniques for Rainfall Forecasting in Thailand: A Comprehensive Review

Muhammad Waqas, +4 more

- 18 Aug 2023

- Water

TL;DR: The investigation concludes that hybrid models combining ANNs with wavelet transformation and bootstrapping can improve the current accuracy of rainfall forecasting in Thailand.

...read moreread less

31

...

Expand

References

•Journal Article•10.1023/A:1010933404324

Random Forests

Leo Breiman

- 01 Oct 2001

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.

...read moreread less

113.1K

Statistical learning theory

Vladimir Vapnik

- 01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

30.4K

•Book

The Elements of Statistical Learning

Trevor Hastie, +2 more

- 01 Jan 2001

29.4K

•Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Trevor Hastie, +2 more

- 28 Jul 2013

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.

...read moreread less

21.3K

Classification and Regression by randomForest

Andy Liaw, +1 more

- 01 Jan 2007

TL;DR: random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.

...read moreread less

20.1K