Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Algorithm Engineering and Experimentation
  4. 2010
  1. Home
  2. Conferences
  3. Algorithm Engineering and Experimentation
  4. 2010
Showing papers presented at "Algorithm Engineering and Experimentation in 2010"
Proceedings Article•
StreamKM++: a clustering algorithm for data streams

[...]

Marcel R. Ackermann1, Christiane Lammersen2, Marcus Märtens1, Christoph Raupach1, Christian Sohler2, Kamil Swierkot1 •
University of Paderborn1, Technical University of Dortmund2
16 Jan 2010
TL;DR: A new k-means clustering algorithm for data streams of points from a Euclidean space that provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large.
Abstract: We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++ seeding procedure to obtain small coresets from the data stream. This construction is rather easy to implement and, unlike other coreset constructions, its running time has only a low dependency on the dimensionality of the data. Second, we propose a new data structure which we call a coreset tree. The use of these coreset trees significantly speeds up the time necessary for the non-uniform sampling during our coreset construction. We compare our algorithm experimentally with two well-known streaming implementations (BIRCH [16] and StreamLS [4, 9]). In terms of quality (sum of squared errors), our algorithm is comparable with StreamLS and significantly better than BIRCH (up to a factor of 2). In terms of running time, our algorithm is slower than BIRCH. Comparing the running time with StreamLS, it turns out that our algorithm scales much better with increasing number of centers. We conclude that, if the first priority is the quality of the clustering, then our algorithm provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large. We also give a theoretical justification of our approach by proving that our sample set is a small coreset in low dimensional spaces.

285 citations

Proceedings Article•
Route planning with flexible objective functions

[...]

Robert Geisberger1, Moritz Kobitzsch1, Peter Sanders1•
Karlsruhe Institute of Technology1
16 Jan 2010
TL;DR: The first fast route planning algorithm that answers shortest paths queries for a customizable linear combination of two different metrics, e.
Abstract: We present the first fast route planning algorithm that answers shortest paths queries for a customizable linear combination of two different metrics, e. g. travel time and energy cost, on large scale road networks. The precomputation receives as input a directed graph, two edge weight functions t(e) and c(e), and a discrete interval [L, U]. The resulting flexible query algorithm finds for a parameter p ∈ [L, U] an exact shortest path for the edge weight t(e)+p·c(e). This allows for different tradeoffs between the two edge weight functions at query time. We apply precomputation based on node contraction, which adds all necessary shortcuts for any parameter choice efficiently. To improve the node ordering, we developed the new concept of gradual parameter interval splitting. Additionally, we improve performance by combining node contraction and a goal-directed technique in our flexible scenario.

71 citations

Proceedings Article•
Exact solutions and bounds for general art gallery problems

[...]

Tobias Baumgartner1, Sándor P. Fekete1, Alexander Kröller1, Christiane Schmidt1•
Braunschweig University of Technology1
16 Jan 2010
TL;DR: A primal-dual algorithm based on linear programming that provides lower bounds on the necessary number of guards in every step and—in case of convergence and integrality—ends with an optimal solution to the classical Art Gallery Problem.
Abstract: The classical Art Gallery Problem asks for the minimum number of guards that achieve visibility coverage of a given polygon. This problem is known to be NP-hard, even for very restricted and discrete special cases. For the case of vertex guards and simple orthogonal polygons, Cuoto et al. have recently developed an exact method that is based on a set cover approach. For the general problem (in which both the set of possible guard positions and the point set to be guarded are uncountable), neither constant-factor approximation algorithms nor exact solution methods are known. We present a primal-dual algorithm based on linear programming that provides lower bounds on the necessary number of guards in every step and---in case of convergence and integrality---ends with an optimal solution. We describe our implementation and give results for an assortment of polygons, including non-orthogonal polygons with holes.

48 citations

Proceedings Article•
Fast local search for steiner trees in graphs

[...]

Eduardo Uchoa1, Renato F. Werneck2•
Federal Fluminense University1, Microsoft2
16 Jan 2010
TL;DR: Efficient algorithms that implement four local searches for the Steiner problem in graphs: vertex insertion, vertex elimination, key-path exchange, and key-vertex elimination are presented.
Abstract: We present efficient algorithms that implement four local searches for the Steiner problem in graphs: vertex insertion, vertex elimination, key-path exchange, and key-vertex elimination. In each case, we show how to find an improving solution (or prove that none exists in the neighborhood) in O(m log n) time on graphs with n vertices and m edges. Many of the techniques and data structures we use are relevant in the study of dynamic graphs in general, beyond Steiner trees. Besides the theoretical interest, our results have practical impact: these local searches have been shown to find good-quality solutions in practice, but high running times limited their applicability.

28 citations

Proceedings Article•
Simple and fast nearest neighbor search

[...]

Marcel Birn1, Manuel Holtgrewe1, Peter Sanders1, Johannes Singler1•
Karlsruhe Institute of Technology1
16 Jan 2010
TL;DR: A simple randomized data structure for two-dimensional point sets that allows fast nearest neighbor queries in many cases is presented and an implementation outperforms several previous implementations for commonly used benchmarks.
Abstract: We present a simple randomized data structure for two-dimensional point sets that allows fast nearest neighbor queries in many cases. An implementation outperforms several previous implementations for commonly used benchmarks.

25 citations

Proceedings Article•
Tabulation based 5-universal hashing and linear probing

[...]

Mikkel Thorup1, Yin Zhang2•
AT&T Labs1, University of Texas at Austin2
16 Jan 2010
TL;DR: If the pre-computed tables are made 5-universal, then the hash value becomes 5- universal without any other change to the computation, which leads to even bigger gains since the direct methods for 5-Universal hashing use degree 4 polynomials.
Abstract: Previously [SODA'04] we devised the fastest known algorithm for 4-universal hashing. The hashing was based on small pre-computed 4-universal tables. This led to a five-fold improvement in speed over direct methods based on degree 3 polynomials. In this paper, we show that if the pre-computed tables are made 5-universal, then the hash value becomes 5-universal without any other change to the computation. Relatively this leads to even bigger gains since the direct methods for 5-universal hashing use degree 4 polynomials. Experimentally, we find that our method can gain up to an order of magnitude in speed over direct 5-universal hashing. Some of the most popular randomized algorithms have been proved to have the desired expected running time using 5-universal hashing, e.g., a non-recursive variant of quicksort takes O(n log n) expected time [Karloff Raghavan JACM'93], and linear probing does updates and searches in O(1) expected time [Pagh et al. SICOMP'09]. In contrast, inputs have been constructed leading to much worse expected performance with some of the classic primality based 2-universal hashing schemes. In the context of linear probing, we compare our new fast 5-universal hashing experimentally with the fastest known plain universal hashing. We know that any reasonable hashing scheme will work on random input, but from Pagh et al., we know that 5-universal hashing leads to good expected performance on all input. We use a dense interval as an example of a structured yet realistic input, wanting to see if this could push the fastest multiplication-shift based plain universal hashing into bad performance. Even though our 5-universal hashing itself is slower than the fast plain universal hashing, it makes linear probing much more robust.

23 citations

Proceedings Article•
Implementation and parallelization of a reverse-search algorithm for Minkowski sums

[...]

Christophe Weibel1•
McGill University1
16 Jan 2010
TL;DR: An implementation of a reverse-search algorithm of Fukuda for computing Minkowski sums of polytopes efficiently and uses the exact arithmetic GMP, which ensures robustness of the program and exacts of the results.
Abstract: We present an implementation of a reverse-search algorithm of Fukuda for computing Minkowski sums of polytopes efficiently. The algorithm allows summing any number of polytopes in any dimension, and is complete in the sense that it does not assume general position. Its running time depends linearly on the size of the output. To the best of our knowledge, this is the only existing implementation that can efficiently compute Minkowski sums in higher dimensions. The implementation uses the exact arithmetic GMP, which ensures robustness of the program and exactness of the results. We furthermore present a parallel version of our implementation to demonstrate the simplicity and efficiency of performing the reverse search in parallel. The results of the performance tests show a near-linear acceleration of our parallel implementation.

17 citations

Proceedings Article•
Employing (1 − ε) dominating set partitions as backbones in wireless sensor networks

[...]

Dhia Mahjoub1, David W. Matula1•
Southern Methodist University1
16 Jan 2010
TL;DR: This paper introduces an efficient algorithm for selecting (δ + 1) backbones with disjoint node sets that are each independent (1 − e) dominating sets of G, and provides an efficient topologically based centralized algorithm for determining the backbones.
Abstract: For a random geometric graph G(n, r) of minimum degree δ, we introduce an efficient algorithm for selecting (δ + 1) backbones with disjoint node sets that are each independent (1 − e) dominating sets of G. The backbone node sets are determined by a graph coloring algorithm employing only the topology (not the geometry) of G(n, r), and the backbone links are selected with link lengths in a narrow window between r and 2r and further to form a planar graph backbone. For large vertex sets (n = 1600, 3200) the resulting backbones are shown to each cover typically over 99% of the vertices of G (i.e. e < 0.01), with about 30% being fully dominating, which is consistent with the ¼ constant approximation factor algorithm proposed recently in [23] for the domatic partition problem in Unit Disk Graphs. We establish experimentally by measures of node degrees, link lengths, and interior triangular face counts that each individual backbone has most of the coverage behavior and routing convenience of the triangular "perfect packing" lattice. We further show for each sample G(n, r) that the relatively few vertices not covered by all (δ + 1) backbones are covered by most of the backbones. Hence backbone rotation in a wireless sensor network would reach all sensors (vertices) sufficiently frequently. Our novel backbone generation algorithm confirms experimentally the existence of these (δ + 1) backbones in a random geometric graph, and provides an efficient topologically based centralized algorithm for determining the backbones. We also point out that our novel backbone construction method is flexible such that any efficient coloring algorithm can be plugged into it. In this paper, we experiment with several coloring algorithms: Smallest Last, Largest First, Lexicographic, Radial Sweep and Random and we compare their respective performance. Our emphasis is, however, on SL since it offers robust properties and interesting expected behavior. We also experiment with several random node distributions: uniform, skewed and normal in both unit square and disk of which we also discuss the results.

15 citations

Book Chapter•10.1137/1.9781611972900.13•
Navigation in real-world complex networks through embedding in latent spaces

[...]

Xiaomeng Ban1, Jie Gao1, Arnout van de Rijt1•
Stony Brook University1
16 Jan 2010
TL;DR: Algorithmic methods are applied to embed nodes in some latent space and employ greedy routing to deliver packages to empirically investigate the navigability of five real-world complex networks from diverse contexts and of varying topology.
Abstract: Small-world experiments in which packages reach addressees unknown to the original sender through a forwarding chain confirm that acquaintance networks have short paths, a property that was later also discovered in many other networks. They further show that people can find these paths by passing the package on to the acquaintance most socially proximate to the target. This has led researchers to conjecture that perhaps also in many other networks some proximity-based algorithm can be used to find short paths, provided that nodes are given appropriate coordinates. Although potential applications are numerous, ranging from decentralized search to recommendation-based trust to disease control, this conjecture has remained largely unverified. In this paper we apply algorithmic methods to embed nodes in some latent space and employ greedy routing to deliver packages. Using these methods we empirically investigate the navigability of five real-world complex networks from diverse contexts and of varying topology. In each network, we deliver a majority of packages in fewer than six hops.

12 citations

Proceedings Article•
Algorithm engineering: an attempt at a definition using sorting as an example

[...]

Peter Sanders1•
Karlsruhe Institute of Technology1
16 Jan 2010
TL;DR: Algorithm engineering (AE) is described as a methodology for algorithmic research where design, analysis, implementation and experimental evaluation of form a feedback cycle driving the development of efficient algorithm.
Abstract: The talk describes algorithm engineering (AE) as a methodology for algorithmic research where design, analysis, implementation and experimental evaluation of form a feedback cycle driving the development of efficient algorithm. Additional important components of the methodology include realistic models, algorithm libraries, and collections of realistic benchmark instances. Examples are given for the fundamental problem of sorting with particular emphasis on huge data sets, advanced hardware, and energy efficiency.

7 citations

Proceedings Article•
A polynomial delay algorithm for enumerating approximate solutions to the interval constrained coloring problem

[...]

Stefan Canzar1, Khaled Elbassioni2, Julián Mestre2•
Centrum Wiskunde & Informatica1, Max Planck Society2
16 Jan 2010
TL;DR: This work studies the interval constrained coloring problem, a combinatorial problem arising in the interpretation of data on protein structure emanating from experiments based on hydrogen/deuterium exchange and mass spectrometry, and proposes a polynomial-delay polynometric-space algorithm for enumerating all exact solutions plus further approximate solutions, whose components are guaranteed to be within an absolute error of one of the optimum.
Abstract: We study the interval constrained coloring problem, a combinatorial problem arising in the interpretation of data on protein structure emanating from experiments based on hydrogen/deuterium exchange and mass spectrometry. The problem captures the challenging task of increasing the spatial resolution of experimental data in order to get a better picture of the protein structure. Since solutions proposed by any algorithmic framework have to ultimately be verified by biochemists, it is important to provide not just a single solution, but a valuable set of candidate solutions. Our contribution is a polynomial-delay polynomial-space algorithm for enumerating all exact solutions plus further approximate solutions, whose components are guaranteed to be within an absolute error of one of the optimum. Our experiments indicate that these approximate solutions are reasonably close to the optimal ones, in terms of the accumulative error. In addition, the experiments also confirm the effectiveness of the method in reducing the delay between two consecutive solutions considerably, compared to what it takes an integer programming solver to produce the next exact solution.
Proceedings Article•
Conjunctive filter: breaking the entropy barrier

[...]

Daisuke Okanohara1, Yuichi Yoshida2•
University of Tokyo1, Kyoto University2
16 Jan 2010
TL;DR: The objective is to break this entropy bound and construct more space-efficient data structures and show that many problems can be solved by using a conjunctive filter such as full-text search and database join queries.
Abstract: We consider a problem for storing a map that associates a key with a set of values. To store n values from the universe of size m, it requires log2(mn) bits of space, which can be approximated as (1.44 + n) log2 m/n bits when n L m. If we allow e fraction of errors in outputs, we can store it with roughly n log2 1/e bits, which matches the entropy bound. Bloom filter is a well-known example for such data structures. Our objective is to break this entropy bound and construct more space-efficient data structures. In this paper, we propose a novel data structure called a conjunctive filter, which supports conjunctive queries on k distinct keys for fixed k. Although a conjunctive filter cannot return the set of values itself associated with a queried key, it can perform conjunctive queries with O(1/√m) fraction of errors. Also, the consumed space is n/k log2 m bits and it is significantly smaller than the entropy bound n/2 log2 m when k ≥ 3. We will show that many problems can be solved by using a conjunctive filter such as full-text search and database join queries. Also, we conducted experiments using a real-world data set, and show that a conjunctive filter answers conjunctive queries almost correctly using about 1/2 ~ 1/4 space as the entropy bound.
Proceedings Article•
Untangling the braid: finding outliers in a set of streams

[...]

Chiranjeeb Buragohain1, Luca Foschini2, Subhash Suri2•
Amazon.com1, University of California, Santa Barbara2
16 Jan 2010
TL;DR: This paper investigates the space complexity of one-pass algorithms for approximating outliers of this kind, proves lower bounds using multi-party communication complexity, and proposes small-memory heuristic algorithms that perform quite well for a variety of synthetic data.
Abstract: Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers), for some measure of performance. Taking a streamcomputing perspective, we can think of each user's performance profile as a stream of numbers (such as response times), and the aggregate performance profile of the shared infrastructure as a "braid" of these intermixed streams. The monitoring system's goal then is to untangle this braid sufficiently to track the top k outliers. This paper investigates the space complexity of one-pass algorithms for approximating outliers of this kind, proves lower bounds using multi-party communication complexity, and proposes small-memory heuristic algorithms. On one hand, stream outliers are easily tracked for simple measures, such as max or min, but our theoretical results rule out even good approximations for most of the natural measures such as average, median, or the quantiles. On the other hand, we show through simulation that our proposed heuristics perform quite well for a variety of synthetic data.
Proceedings Article•
Budgeted maximum coverage with overlapping costs: monitoring the emerging infections network

[...]

Donald E. Curtis1, Sriram V. Pemmaraju1, Philip M. Polgreen1•
University of Iowa1
16 Jan 2010
TL;DR: This work model the problem of monitoring a listserv, such as the EIN, as a type of budgeted maximum coverage problem that is Budgeted Maximization with Overlapping Costs (BMOC), and identifies small sets of "bellwether" users who are good predictors of important discussions.
Abstract: The Emerging Infections Network (EIN) (http://ein.idsociety.org/) is a CDC supported "sentinel" network of over 1400 members (currently), designed to connect clinical infectious disease specialists and public health officials. Members primarily communicate through an EIN managed listserv and discuss disease outbreaks, treatment protocols, effectiveness of vaccinations and other disease-control and prevention mechanisms, etc. Recently, researchers at Google and Yahoo! Research have used search engine query logs to tap into the online "wisdom of crowds" and produce disease outbreak trends for flu. Following this work, there is now interest in trying to monitor EIN discussions more carefully to disseminate timely and accurate information on clinical events of possible interest to health officials. We model the problem of monitoring a listserv, such as the EIN, as a type of budgeted maximum coverage problem that we call Budgeted Maximization with Overlapping Costs (BMOC). Even though BMOC seems superficially similar to the budgeted maximum coverage problem considered by Khuller et al. (Inf. Process. Lett., 1999), our problem is fundamentally different from an algorithmic point of view, due to its cost structure. We observe that the greedy algorithm that provides a constant-factor approximation to the budgeted maximum coverage problem can be arbitrarily bad for BMOC. We also present a reduction to BMOC from the k-densest subgraph problem that provides evidence indicating that obtaining a constant-factor approximation for our problem might be quite challenging. Nevertheless, experimental runs of the greedy algorithm on the EIN data show that greedy performs remarkably well relative to OPT. We identify a feature of our EIN data, that we call the overlap condition, and show that the greedy algorithm does indeed yield a constant-factor approximation guarantee if the overlap condition is satisfied. Using an implementation of the greedy algorithm for BMOC on the EIN data, we identify small sets of "bellwether" users who are good predictors of important discussions. We provide evidence to show that tracking just these users reduces the cost of monitoring the EIN significantly without causing any important discussions to be missed.
Proceedings Article•
Implementing streaming simplification for large labeled meshes

[...]

Catalin Constantin1, Shawn Brown1, Jack Snoeyink1•
University of North Carolina at Chapel Hill1
16 Jan 2010
TL;DR: Garland and Heckbert's quadric error metric in conjunction with edge contraction gives a greedy approach to simplify a mesh that can fit in memory, and is applied to streaming meshes, suggested by Isenburg.
Abstract: Data capture technologies like airborne LIDAR produce extremely large models of digital terrain, which must be simplified to be useful. Garland and Heckbert's quadric error metric in conjunction with edge contraction gives a greedy approach to simplify a mesh that can fit in memory; we adapt it to work with boundaries and labels (e.g., object ID, ground vs. building, or some discrimination between parts of the mesh that is to be preserved during simplification). More importantly, we apply it to streaming meshes, suggested by Isenburg, which are represented as an intermixed sequence of vertices, triangles, and finalization tags indicating the last use of any vertex. These tags essentially document spatial locality in the stream. We discuss the engineering decisions that allow our algorithm to achieve fast, high-quality simplification of gigabyte datasets using a small memory footprint.
Proceedings Article•
Succinct trees in practice

[...]

Diego Arroyuelo1, Rodrigo Cánovas2, Gonzalo Navarro2, Kunihiko Sadakane3•
Yahoo!1, University of Chile2, National Institute of Informatics3
16 Jan 2010
TL;DR: The latter technique stands out as an excellent practical combination of space occupancy, time performance, and functionality, whereas others, particularly LOUDS, are still interesting in some limited-functionality niches.
Abstract: We implement and compare the major current techniques for representing general trees in succinct form. This is important because a general tree of n nodes is usually represented in pointer form, requiring O(n log n) bits, whereas the succinct representations we study require just 2n + o(n) bits and carry out many sophisticated operations in constant time. Yet, there is no exhaustive study in the literature comparing the practical magnitudes of the o(n)-space and the O(1)-time terms. The techniques can be classified into three broad trends: those based on BP (balanced parentheses in preorder), those based on DFUDS (depth-first unary degree sequence), and those based on LOUDS (level-ordered unary degree sequence). BP and DFUDS require a balanced parentheses representation that supports the core operations findopen, findclose, and enclose, for which we implement and compare three major algorithmic proposals. All the tree representations require also core operations rank and select on bitmaps, which are already well studied in the literature. We show how to predict the time and space performance of most variants via combining these core operations, and also study some tree operations for which specialized implementations exist. This is especially relevant for a recent proposal (K. Sadakane and G. Navarro, SODA'10) which, although belonging to class BP, deviates from the main techniques in some cases in order to achieve constant time for the widest range of operations. We experiment over various types of real-life trees and of traversals, and conclude that the latter technique stands out as an excellent practical combination of space occupancy, time performance, and functionality, whereas others, particularly LOUDS, are still interesting in some limited-functionality niches.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve