Journal Article10.1198/106186006X94072
Unsupervised Learning With Random Forest Predictors
Tao Shi,Steve Horvath +1 more
TL;DR: The RF dissimilarity is useful for detecting tumor sample clusters on the basis of tumor marker expressions and can be described with simple thresholding rules in this application.
read more
Abstract: A random forest (RF) predictor is an ensemble of individual tree predictors. As part of their construction, RF predictors naturally lead to a dissimilarity measure between the observations. One can also define an RF dissimilarity measure between unlabeled data: the idea is to construct an RF predictor that distinguishes the “observed” data from suitably generated synthetic data. The observed data are the original unlabeled data and the synthetic data are drawn from a reference distribution. Here we describe the properties of the RF dissimilarity and make recommendations on how to use it in practice.An RF dissimilarity can be attractive because it handles mixed variable types well, is invariant to monotonic transformations of the input variables, and is robust to outlying observations. The RF dissimilarity easily deals with a large number of variables due to its intrinsic variable selection; for example, the Addcl 1 RF dissimilarity weighs the contribution of each variable according to how dependent it is ...
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Isolation Forest
F.T. Liu,Kai Ming Ting,Zhi-Hua Zhou +2 more
- 15 Dec 2008
TL;DR: The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement.
5.4K
Isolation-Based Anomaly Detection
TL;DR: This article proposes a method called Isolation Forest (iForest), which detects anomalies purely based on the concept of isolation without employing any distance or density measure---fundamentally different from all existing methods.
1.9K
Random forests for genomic data analysis.
TL;DR: This article systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.
879
Decision Forests for Computer Vision and Medical Image Analysis
Antonio Criminisi,Jamie Shotton +1 more
- 31 Jan 2013
TL;DR: This practical and easy-to-follow text explores the theoretical underpinnings of decision forests, organizing the vast existing literature on the field within a new, general-purpose forest model.
677
Anomaly Detection for IoT Time-Series Data: A Survey
TL;DR: A background on the challenges which may be encountered when applying anomaly detection techniques to IoT data is provided, with examples of applications for the IoT anomaly detection taken from the literature.
References
Random Forests
Leo Breiman
- 01 Oct 2001
TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Nonparametric Estimation from Incomplete Observations
Edward L. Kaplan,Paul Meier +1 more
TL;DR: In this article, the product-limit (PL) estimator was proposed to estimate the proportion of items in the population whose lifetimes would exceed t (in the absence of such losses), without making any assumption about the form of the function P(t).
•Book
Classification and regression trees
Leo Breiman
- 01 Jan 1983
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
22.7K
Modern Applied Statistics with S
W. N. Venables,Brian D. Ripley +1 more
- 01 Dec 2010
TL;DR: A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods.
22.1K
Related Papers (5)
[...]
Leo Breiman
- 01 Oct 2001
Andy Liaw,Matthew C. Wiener +1 more
- 01 Jan 2007
A. Asuncion
- 01 Jan 2007