Proceedings Article10.1109/FUZZ48607.2020.9177718
Fuzzy Set-Based Isolation Forest
Paweł Karczmarek,Adam Kiersztyn,Witold Pedrycz +2 more
- 19 Jul 2020
- pp 1-6
33
TL;DR: This paper analyzes the improvement of a well-known method, i.e. Isolation Forest, for which an innovative modification is introduced, referred to as the Fuzzy Set-Based IsolationForest, which is effectively improved through the use of efficient solutions based on fuzzy set technologies.
read more
Abstract: One of the main challenges is the analysis of large data sets, in particular those containing various types of data, such as time, place, image, and those assuming categorical values. This type of data may contain numerous outliers. Despite the continuous development of data analysis, many methods can be effectively improved, in particular through the use of efficient solutions based on fuzzy set technologies. In this paper, we analyze the improvement of a well-known method, i.e. Isolation Forest, for which we introduce an innovative modification, referred to as the Fuzzy Set-Based Isolation Forest.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Probabilistic Generalization of Isolation Forest
TL;DR: The Probabilistic Generalization of Isolation Forest (PGIF) as discussed by the authors was proposed to detect anomalies hidden between clusters more effectively, which is based on nonlinear dependence of segment-cumulated probability from the length of segment.
47
Smart Strawberry Farming Using Edge Computing and IoT
Mateus Cruz,Samuel Baraldi Mafra,Eduardo Teixeira,Felipe Figueiredo +3 more
TL;DR: In this paper , the authors developed an edge technology capable of handling the collection, analysis, prediction, and detection of heterogeneous data in strawberry farming, which integrates various monitoring services into one common platform for digital farming.
39
A new method for fault detection of aero-engine based on isolation forest
TL;DR: It is proved that the proposed dynamic threshold method for aero-engine fault detection based on Isolation Forest can not only achieve high detection accuracy but also has a short running time.
39
Enhanced anomaly scores for isolation forests
Antonella Mensi,Manuele Bicego +1 more
TL;DR: In this article, the authors proposed enhanced anomaly scores of the Isolation Forest by making two different contributions: the first consists in weighing the path traversed by an object to obtain a more informative anomaly score; the second contribution employs a different aggregation function to combine the tree scores.
32
A Review of Tree-Based Approaches for Anomaly Detection
Tommaso Barbariol,Filippo Dalla Chiara,Davide Marcato,Gian Antonio Susto +3 more
- 01 Jan 2022
TL;DR: In this paper, a review of the most popular and powerful Tree-based approaches to anomaly detection is presented, considering both batch and streaming data scenarios, and several relevant aspects of the methods, like computational costs and interpretability traits are reviewed.
28
References
Anomaly detection: A survey
TL;DR: This survey tries to provide a structured and comprehensive overview of the research on anomaly detection by grouping existing techniques into different categories based on the underlying approach adopted by each technique.
Estimating the Support of a High-Dimensional Distribution
TL;DR: In this paper, the authors propose a method to estimate a function f that is positive on S and negative on the complement of S. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.
Isolation Forest
F.T. Liu,Kai Ming Ting,Zhi-Hua Zhou +2 more
- 15 Dec 2008
TL;DR: The use of isolation enables the proposed method, iForest, to exploit sub-sampling to an extent that is not feasible in existing methods, creating an algorithm which has a linear time complexity with a low constant and a low memory requirement.
5.4K
Williamson, estimating the support of a high-dimensional distribution
Bernhard Schölkopf,John Platt,J Shawe Taylor +2 more
- 01 Jan 2001
TL;DR: The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data by carrying out sequential optimization over pairs of input patterns and providing a theoretical analysis of the statistical performance of the algorithm.
5K
Efficient algorithms for mining outliers from large data sets
Sridhar Ramaswamy,Rajeev Rastogi,Kyuseok Shim +2 more
- 16 May 2000
TL;DR: A novel formulation for distance-based outliers that is based on the distance of a point from its kth nearest neighbor is proposed and the top n points in this ranking are declared to be outliers.