M-tree

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.1145/93597.98741•

The R*-tree: an efficient and robust access method for points and rectangles

[...]

Norbert Beckmann¹, Hans-Peter Kriegel¹, Ralf Schneider¹, Bernhard Seeger¹•Institutions (1)

University of Bremen¹

1 May 1990

TL;DR: The R*-tree is designed which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory which clearly outperforms the existing R-tree variants.

...read moreread less

Abstract: The R-tree, one of the most popular access methods for rectangles, is based on the heuristic optimization of the area of the enclosing rectangle in each inner node. By running numerous experiments in a standardized testbed under highly varying data, queries and operations, we were able to design the R*-tree which incorporates a combined optimization of area, margin and overlap of each enclosing rectangle in the directory. Using our standardized testbed in an exhaustive performance comparison, it turned out that the R*-tree clearly outperforms the existing R-tree variants. Guttman's linear and quadratic R-tree and Greene's variant of the R-tree. This superiority of the R*-tree holds for different types of queries and operations, such as map overlay, for both rectangles and multidimensional points in all experiments. From a practical point of view the R*-tree is very attractive because of the following two reasons 1 it efficiently supports point and spatial data at the same time and 2 its implementation cost is only slightly higher than that of other R-trees.

...read moreread less

4,923 citations

Proceedings Article•

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

[...]

Paolo Ciaccia¹, Marco Patella, Pavel Zezula•Institutions (1)

University of Bologna¹

25 Aug 1997

TL;DR: The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

...read moreread less

Abstract: A new access method, called M-tree, is proposed to organize and search large data sets from a generic “metric space”, i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced - several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described. Results from extensive experimentation with a prototype system are reported, considering as the performance criteria the number of page I/O’s and the number of distance computations. The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.

...read moreread less

1,936 citations

Journal Article•10.1145/502807.502808•

Searching in metric spaces

[...]

Edgar Chávez¹, Gonzalo Navarro², Ricardo Baeza-Yates², Jose L. Marroquin³•Institutions (3)

Universidad Michoacana de San Nicolás de Hidalgo¹, University of Chile², Centro de Investigación en Matemáticas³

01 Sep 2001-ACM Computing Surveys

TL;DR: A unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework, and presents a quantitative definition of the elusive concept of "intrinsic dimensionality".

...read moreread less

Abstract: The problem of searching the elements of a set that are close to a given query element under some similarity criterion has a vast number of applications in many branches of computer science, from pattern recognition to textual and multimedia information retrieval. We are interested in the rather general case where the similarity criterion defines a metric space, instead of the more restricted case of a vector space. Many solutions have been proposed in different areas, in many cases without cross-knowledge. Because of this, the same ideas have been reconceived several times, and very different presentations have been given for the same approaches. We present some basic results that explain the intrinsic difficulty of the search problem. This includes a quantitative definition of the elusive concept of "intrinsic dimensionality." We also present a unified view of all the known proposals to organize metric spaces, so as to be able to understand them under a common framework. Most approaches turn out to be variations on a few different concepts. We organize those works in a taxonomy that allows us to devise new algorithms from combinations of concepts not noticed before because of the lack of communication between different communities. We present experiments validating our results and comparing the existing approaches. We finish with recommendations for practitioners and open questions for future development.

...read moreread less

1,480 citations

Proceedings Article•

Using the triangle inequality to accelerate k-means

[...]

Charles Elkan¹•Institutions (1)

University of California, San Diego¹

21 Aug 2003

TL;DR: The accelerated k-means algorithm is shown how to accelerate dramatically, while still always computing exactly the same result as the standard algorithm, and is effective for datasets with up to 1000 dimensions, and becomes more and more effective as the number k of clusters increases.

...read moreread less

Abstract: The k-means algorithm is by far the most widely used method for discovering clusters in data. We show how to accelerate it dramatically, while still always computing exactly the same result as the standard algorithm. The accelerated algorithm avoids unnecessary distance calculations by applying the triangle inequality in two different ways, and by keeping track of lower and upper bounds for distances between points and centers. Experiments show that the new algorithm is effective for datasets with up to 1000 dimensions, and becomes more and more effective as the number k of clusters increases. For k ≥ 20 it is many times faster than the best previously known accelerated k-means method.

...read moreread less

904 citations

Book Chapter•10.1016/B978-012088469-8.50070-X•

On the marriage of Lp-norms and edit distance

[...]

Lei Chen¹, Raymond T. Ng²•Institutions (2)

University of Waterloo¹, University of British Columbia²

31 Aug 2004

TL;DR: A new distance function, which is a marriage of L1- norm and the edit distance, ERP, which can support local time shifting, and is a metric, and dominates all existing strategies.

...read moreread less

Abstract: Existing studies on time series are based on two categories of distance functions. The first category consists of the Lp-norms. They are metric distance functions but cannot support local time shifting. The second category consists of distance functions which are capable of handling local time shifting but are nonmetric. The first contribution of this paper is the proposal of a new distance function, which we call ERP ("Edit distance with Real Penalty"). Representing a marriage of L1- norm and the edit distance, ERP can support local time shifting, and is a metric. The second contribution of the paper is the development of pruning strategies for large time series databases. Given that ERP is a metric, one way to prune is to apply the triangle inequality. Another way to prune is to develop a lower bound on the ERP distance. We propose such a lower bound, which has the nice computational property that it can be efficiently indexed with a standard B+- tree. Moreover, we show that these two ways of pruning can be used simultaneously for ERP distances. Specifically, the false positives obtained from the B+-tree can be further minimized by applying the triangle inequality. Based on extensive experimentation with existing benchmarks and techniques, we show that this combination delivers superb pruning power and search time performance, and dominates all existing strategies.

...read moreread less

871 citations

...

Expand

Year	Papers
2022	1
2021	1
2019	3
2018	1
2017	2
2016	4

Topic Tools

Papers published on a yearly basis

Papers

The R*-tree: an efficient and robust access method for points and rectangles

M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

Searching in metric spaces

Using the triangle inequality to accelerate k-means

On the marriage of Lp-norms and edit distance

Related Topics (5)

Performance Metrics