Conference
Machine Learning and Data Mining in Pattern Recognition
About: Machine Learning and Data Mining in Pattern Recognition is an academic conference. The conference publishes majorly in the area(s): Cluster analysis & Computer science. Over the lifetime, 685 publications have been published by the conference receiving 8119 citations.
Topics: Cluster analysis, Computer science, Support vector machine, Feature selection, Canopy clustering algorithm
Papers
13 Jul 2012
TL;DR: Analysis of whether there is an optimal number of trees within a Random Forest finds an experimental relationship for the AUC gain when doubling the number of Trees in any forest and states there is a threshold beyond which there is no significant gain, unless a huge computational environment is available.
Abstract: Random Forest is a computationally efficient technique that can operate quickly over large datasets. It has been used in many recent research projects and real-world applications in diverse domains. However, the associated literature provides almost no directions about how many trees should be used to compose a Random Forest. The research reported here analyzes whether there is an optimal number of trees within a Random Forest, i.e., a threshold from which increasing the number of trees would bring no significant performance gain, and would only increase the computational cost. Our main conclusions are: as the number of trees grows, it does not always mean the performance of the forest is significantly better than previous forests (fewer trees), and doubling the number of trees is worthless. It is also possible to state there is a threshold beyond which there is no significant gain, unless a huge computational environment is available. In addition, it was found an experimental relationship for the AUC gain when doubling the number of trees in any forest. Furthermore, as the number of trees grows, the full set of attributes tend to be used within a Random Forest, which may not be interesting in the biomedical domain. Additionally, datasets' density-based metrics proposed here probably capture some aspects of the VC dimension on decision trees and low-density datasets may require large capacity machines whilst the opposite also seems to be true.
1,017 citations
18 Jul 2007
TL;DR: A novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed, modifying a nonparametric density estimate with a variable kernel to yield a robust local density estimation.
Abstract: Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density of each point to the local density of its neighbors. Our experiments performed on several simulated data sets have demonstrated that the proposed approach can outperform two widely used outlier detection algorithms (LOF and LOCI).
342 citations
15 Jul 2017
TL;DR: In this article, the transferability of adversarial examples is verified across different DQN models, and a novel class of attacks based on this vulnerability is presented to enable policy manipulation and induction in the learning process of DQNs.
Abstract: Deep learning classifiers are known to be inherently vulnerable to manipulation by intentionally perturbed inputs, named adversarial examples. In this work, we establish that reinforcement learning techniques based on Deep Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and verify the transferability of adversarial examples across different DQN models. Furthermore, we present a novel class of attacks based on this vulnerability that enable policy manipulation and induction in the learning process of DQNs. We propose an attack mechanism that exploits the transferability of adversarial examples to implement policy induction attacks on DQNs, and demonstrate its efficacy and impact through experimental study of a game-learning scenario.
254 citations
18 Jul 2007
TL;DR: A data mining approach to the problem of predicting the location of a moving object, using the database of moving object locations to discover frequent trajectories and movement rules to build a probabilistic model of object location.
Abstract: Advances in wireless and mobile technology flood us with amounts of moving object data that preclude all means of manual data processing. The volume of data gathered from position sensors of mobile phones, PDAs, or vehicles, defies human ability to analyze the stream of input data. On the other hand, vast amounts of gathered data hide interesting and valuable knowledge patterns describing the behavior of moving objects. Thus, new algorithms for mining moving object data are required to unearth this knowledge. An important function of the mobile objects management system is the prediction of the unknown location of an object. In this paper we introduce a data mining approach to the problem of predicting the location of a moving object. We mine the database of moving object locations to discover frequent trajectories and movement rules. Then, we match the trajectory of a moving object with the database of movement rules to build a probabilistic model of object location. Experimental evaluation of the proposal reveals prediction accuracy close to 80%. Our original contribution includes the elaboration on the location prediction model, the design of an efficient mining algorithm, introduction of movement rule matching strategies, and a thorough experimental evaluation of the proposed model.
224 citations
5 Jul 2003
TL;DR: A supervised learning algorithm that extends Schapire and Singer's Adaboost and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason).
Abstract: Multi-label decision procedures are the target of the supervised learning algorithm we propose in this paper. Multi-label decision procedures map examples to a finite set of labels. Our learning algorithm extends Schapire and Singer's Adaboost.MH and produces sets of rules that can be viewed as trees like Alternating Decision Trees (invented by Freund and Mason). Experiments show that we take advantage of both performance and readability using boosting techniques as well as tree representations of large set of rules. Moreover, a key feature of our algorithm is the ability to handle heterogenous input data: discrete and continuous values and text data.
220 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2019 | 21 |
| 2018 | 69 |
| 2017 | 31 |
| 2016 | 60 |
| 2015 | 32 |
| 2014 | 40 |