About: Minimum spanning tree is a research topic. Over the lifetime, 6125 publications have been published within this topic receiving 136299 citations. The topic is also known as: MST & shortest spanning tree.
TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Abstract: We present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces. For data sets of size n living in R d , the algorithms require space that is only polynomial in n and d, while achieving query times that are sub-linear in n and polynomial in d. We also show applications to other high-dimensional geometric problems, such as the approximate minimum spanning tree. The article is based on the material from the authors' STOC'98 and FOCS'01 papers. It unifies, generalizes and simplifies the results from those papers.
TL;DR: A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method.
Abstract: A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method. Development of these clustering algorithms was based on examples from two-dimensional space because we wanted to copy the human perception of gestalts or point groupings. On the other hand, all the methods considered apply to higher dimensional spaces and even to general metric spaces. Advantages of these methods include determinacy, easy interpretation of the resulting clusters, conformity to gestalt principles of perceptual organization, and invariance of results under monotone transformations of interpoint distance. Brief discussion is made of the application of cluster detection to taxonomy and the selection of good feature spaces for pattern recognition. Detailed analyses of several planar cluster detection problems are illustrated by text and figures. The well-known Fisher iris data, in four-dimensional space, have been analyzed by these methods also. PL/1 programs to implement the minimal spanning tree methods have been fully debugged.
TL;DR: It is shown that maxπwπ = C* precisely when a certain well-known linear program has an optimal solution in integers.
Abstract: This paper explores new approaches to the symmetric traveling-salesman problem in which 1-trees, which are a slight variant of spanning trees, play an essential role. A 1-tree is a tree together with an additional vertex connected to the tree by two edges. We observe that i a tour is precisely a 1-tree in which each vertex has degree 2, ii a minimum 1-tree is easy to compute, and iii the transformation on "intercity distances" cij â Cij + πi + πj leaves the traveling-salesman problem invariant but changes the minimum 1-tree. Using these observations, we define an infinite family of lower bounds wπ on C*, the cost of an optimum tour. We show that maxπwπ = C* precisely when a certain well-known linear program has an optimal solution in integers. We give a column-generation method and an ascent method for computing maxπwπ, and construct a branch-and-bound method in which the lower bounds wπ control the search for an optimum tour.
TL;DR: In this paper, a set of analyses using a hypothetical landscape mosaic of habitat patches in a nonhabitat matrix is developed. And the results suggest that a simple graph construct, the minimum spanning tree, can serve as a powerful guide to decisions about the relative importance of individual patches to overall landscape con- nectivity.
Abstract: Ecologists are familiar with two data structures commonly used to represent landscapes. Vector-based maps delineate land cover types as polygons, while raster lattices represent the landscape as a grid. Here we adopt a third lattice data structure, the graph. A graph represents a landscape as a set of nodes (e.g., habitat patches) connected to some degree by edges that join pairs of nodes functionally (e.g., via dispersal). Graph theory is well developed in other fields, including geography (transportation networks, routing ap- plications, siting problems) and computer science (circuitry and network optimization). We present an overview of basic elements of graph theory as it might be applied to issues of connectivity in heterogeneous landscapes, focusing especially on applications of metapo- pulation theory in conservation biology. We develop a general set of analyses using a hypothetical landscape mosaic of habitat patches in a nonhabitat matrix. Our results suggest that a simple graph construct, the minimum spanning tree, can serve as a powerful guide to decisions about the relative importance of individual patches to overall landscape con- nectivity. We then apply this approach to an actual conservation scenario involving the
TL;DR: This is the first formal analysis of the effect of instance-based noise in the context of data privacy, and shows how to do this efficiently for several different functions, including the median and the cost of the minimum spanning tree.
Abstract: We introduce a new, generic framework for private data analysis.The goal of private data analysis is to release aggregate information about a data set while protecting the privacy of the individuals whose information the data set contains.Our framework allows one to release functions f of the data withinstance-based additive noise. That is, the noise magnitude is determined not only by the function we want to release, but also bythe database itself. One of the challenges is to ensure that the noise magnitude does not leak information about the database. To address that, we calibrate the noise magnitude to the smoothsensitivity of f on the database x --- a measure of variabilityof f in the neighborhood of the instance x. The new frameworkgreatly expands the applicability of output perturbation, a technique for protecting individuals' privacy by adding a smallamount of random noise to the released statistics. To our knowledge, this is the first formal analysis of the effect of instance-basednoise in the context of data privacy.Our framework raises many interesting algorithmic questions. Namely,to apply the framework one must compute or approximate the smoothsensitivity of f on x. We show how to do this efficiently for several different functions, including the median and the cost ofthe minimum spanning tree. We also give a generic procedure based on sampling that allows one to release f(x) accurately on manydatabases x. This procedure is applicable even when no efficient algorithm for approximating smooth sensitivity of f is known orwhen f is given as a black box. We illustrate the procedure by applying it to k-SED (k-means) clustering and learning mixtures of Gaussians.