Kernel methods for high dimensional data analysis

Open AccessDissertation

Kernel methods for high dimensional data analysis

- 28 May 2019

TL;DR: This thesis aims at introducing a new point of view in the use of distances and probability measures defined on the data set, and shows that kernel methods, already used in the intrinsic low dimensional scenario in order to reduce dimensionality, can be investigated under purely high dimensional hypotheses.

Abstract: Since data are being collected using an increasing number of features, datasets are of increasingly high dimension. Computational problems, related to the apparent dimension, i.e. the dimension of the vectors used to collect data, and theoretical problems, which depends notably on the effective dimension of the dataset, the so called intrinsic dimension, have affected high dimensional data analysis. In order to provide a suitable approach to data analysis in high dimensions, we introduce a more comprehensive scenario in the framework of metric measure spaces. The aim of this thesis, is to show how to take advantage of high dimensionality phenomena in the pure high dimensional regime. In particular, we aim at introducing a new point of view in the use of distances and probability measures defined on the data set. More specifically, we want to show that kernel methods, already used in the intrinsic low dimensional scenario in order to reduce dimensionality, can be investigated under purely high dimensional hypotheses, and further applied to cases not covered by the literature.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

- 02 Aug 1996

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

20.3K

•Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

- 01 Jan 1996

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

17.8K

•Proceedings Article•10.1145/37401.37422

Marching cubes: A high resolution 3D surface construction algorithm

William E. Lorensen, +1 more

- 01 Aug 1987

TL;DR: In this paper, a divide-and-conquer approach is used to generate inter-slice connectivity, and then a case table is created to define triangle topology using linear interpolation.

...read moreread less

14.5K

Proceedings Article•10.1145/276698.276876

Approximate nearest neighbors: towards removing the curse of dimensionality

Piotr Indyk, +1 more

- 23 May 1998

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.

...read moreread less

4.7K

•Proceedings Article•10.1145/997817.997857

Locality-sensitive hashing scheme based on p-stable distributions

Mayur Datar, +3 more

- 08 Jun 2004

TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.

...read moreread less

3.6K