Estimating the Support of a High-Dimensional Distribution
TL;DR: In this paper, the authors propose a method to estimate a function f that is positive on S and negative on the complement of S. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space.
read more
Abstract: Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Safety assessment of automated vehicles: how to determine whether we have collected enough field data?
TL;DR: The proposed method will be used to evaluate the level of completeness of the data collection on Singaporean roads, aimed at defining relevant test cases for the autonomous vehicle road approval procedure that is being developed in Singapore.
34
Adaptive One-Class Support Vector Machine for Damage Detection in Structural Health Monitoring
Ali Anaissi,Nguyen Lu Dang Khoa,Samir Mustapha,Mehrisadat Makki Alamdari,Ali Braytee,Yang Wang,Fang Chen +6 more
- 23 May 2017
TL;DR: A new algorithm named Appropriate Distance to the Enclosing Surface (ADES) is proposed for tuning the Gaussian model parameter for OCSVM especially in high dimensional datasets.
Similarity-based approach for positive and unlabelled learning
Yanshan Xiao,Bo Liu,Jie Yin,Longbing Cao,Chengqi Zhang,Zhifeng Hao +5 more
- 16 Jul 2011
TL;DR: This paper proposes a novel approach, called similarity-based PU learning (SPUL) method, by associating the ambiguous examples with two similarity weights, which indicate the similarity of an ambiguous example towards the positive class and the negative class, respectively.
34
•Proceedings Article
Data centering in feature space.
Marina Meila
- 03 Jan 2003
TL;DR: In this paper, a family of methods for data translation in feature space, to be used in conjunction with kernel machines, is presented, where the translations are performed using only kernel evaluations in input space.
34
Sketch recognition by fusion of temporal and image-based features
TL;DR: These results are the first to confirm the complementary nature of image-based and temporal recognition methods for full sketch recognition, which has long been suggested, but never supported by data.
34
References
•Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
- 01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
•Book
The Nature of Statistical Learning Theory
Vladimir Vapnik
- 01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
46K
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
A training algorithm for optimal margin classifiers
Bernhard E. Boser,Isabelle Guyon,Vladimir Vapnik +2 more
- 01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
A tutorial on support vector regression
TL;DR: This tutorial gives an overview of the basic ideas underlying Support Vector (SV) machines for function estimation, and includes a summary of currently used algorithms for training SV machines, covering both the quadratic programming part and advanced methods for dealing with large datasets.
Related Papers (5)
Vladimir Vapnik
- 01 Jan 1995
Vladimir Vapnik
- 01 Jan 1998