Open AccessDissertation
Kernel methods for high dimensional data analysis
Alba Chiara de Vitis
- 28 May 2019
TL;DR: This thesis aims at introducing a new point of view in the use of distances and probability measures defined on the data set, and shows that kernel methods, already used in the intrinsic low dimensional scenario in order to reduce dimensionality, can be investigated under purely high dimensional hypotheses.
read more
Abstract: Since data are being collected using an increasing number of features, datasets are of increasingly high dimension. Computational problems, related to the apparent dimension, i.e. the dimension of the vectors used to collect data, and theoretical problems, which depends notably on the effective dimension of the dataset, the so called intrinsic dimension, have affected high dimensional data analysis. In order to provide a suitable approach to data analysis in high dimensions, we introduce a more comprehensive scenario in the framework of metric measure spaces. The aim of this thesis, is to show how to take advantage of high dimensionality phenomena in the pure high dimensional regime. In particular, we aim at introducing a new point of view in the use of distances and probability measures defined on the data set. More specifically, we want to show that kernel methods, already used in the intrinsic low dimensional scenario in order to reduce dimensionality, can be investigated under purely high dimensional hypotheses, and further applied to cases not covered by the literature.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
•Proceedings Article
A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 02 Aug 1996
TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
20.3K
•Proceedings Article
A density-based algorithm for discovering clusters in large spatial Databases with Noise
Martin Ester,Hans-Peter Kriegel,Jörg Sander,Xiaowei Xu +3 more
- 01 Jan 1996
TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Marching cubes: A high resolution 3D surface construction algorithm
William E. Lorensen,Harvey E. Cline +1 more
- 01 Aug 1987
TL;DR: In this paper, a divide-and-conquer approach is used to generate inter-slice connectivity, and then a case table is created to define triangle topology using linear interpolation.
Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk,Rajeev Motwani +1 more
- 23 May 1998
TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Locality-sensitive hashing scheme based on p-stable distributions
Mayur Datar,Nicole Immorlica,Piotr Indyk,Vahab Mirrokni +3 more
- 08 Jun 2004
TL;DR: A novel Locality-Sensitive Hashing scheme for the Approximate Nearest Neighbor Problem under lp norm, based on p-stable distributions that improves the running time of the earlier algorithm and yields the first known provably efficient approximate NN algorithm for the case p<1.
Related Papers (5)
Bastian Rieck,Heike Leitte +1 more
- 20 May 2015
Srikanta Mishra,Akhil Datta-Gupta +1 more
- 01 Jan 2018