Top 15 papers presented at Algorithm Engineering and Experimentation in 2009

Showing papers presented at "Algorithm Engineering and Experimentation in 2009"

Proceedings Article•

Time-dependent contraction hierarchies

[...]

G. Veit Batz¹, Daniel Delling¹, Peter Sanders¹, Christian Vetter¹•Institutions (1)

3 Jan 2009

TL;DR: This is the first hierarchical speedup technique for time-dependent routing that allows bidirectional query algorithms and outperforms previous techniques with respect to query time using comparable or lower preprocessing time.

...read moreread less

Abstract: Contraction hierarchies are a simple hierarchical routing technique that has proved extremely efficient for static road networks. We explain how to generalize them to networks with time-dependent edge weights. This is the first hierarchical speedup technique for time-dependent routing that allows bidirectional query algorithms. For large realistic networks with considerable time-dependence (Germany, weekdays) our method outperforms previous techniques with respect to query time using comparable or lower preprocessing time.

...read moreread less

120 citations

Proceedings Article•

Tuning BNDM with > q -grams

[...]

Branislav Ďurian, Jan Holub¹, Hannu Peltola², Jorma Tarhio²•Institutions (2)

Czech Technical University in Prague¹, Helsinki University of Technology²

3 Jan 2009

TL;DR: These algorithms are variations of the BNDM and Shift-Or algorithms, and many of the new variations are substantially faster than any previous string matching algorithm on x86 processors for English and DNA data.

...read moreread less

Abstract: We develop bit-parallel algorithms for exact string matching. Our algorithms are variations of the BNDM and Shift-Or algorithms. At each alignment the algorithms read a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our experiments show that many of the new variations are substantially faster than any previous string matching algorithm on x86 processors for English and DNA data.

...read moreread less

44 citations

Book Chapter•10.1137/1.9781611972894.9•

Design and implementation of a practical I/O-efficient shortest paths algorithm

[...]

Ulrich Meyer¹, Vitaly Osipov²•Institutions (2)

Goethe University Frankfurt¹, Karlsruhe Institute of Technology²

3 Jan 2009

TL;DR: Initial experimental results for a practical I/O-efficient Single-Source Shortest-Paths (SSSP) algorithm on general undirected sparse graphs where the ratio between the largest and the smallest edge weight is reasonably bounded and the realistic assumption holds that main memory is big enough to keep one bit per vertex.

...read moreread less

Abstract: We report on initial experimental results for a practical I/O-efficient Single-Source Shortest-Paths (SSSP) algorithm on general undirected sparse graphs where the ratio between the largest and the smallest edge weight is reasonably bounded (for example integer weights in {1, . . ., 232}) and the realistic assumption holds that main memory is big enough to keep one bit per vertex. While our implementation only guarantees average-case efficiency, i.e., assuming randomly chosen edge-weights, it turns out that its performance on real-world instances with non-random edge weights is actually even better than on the respective inputs with random weights. Furthermore, compared to the currently best implementation for external-memory BFS [6], which in a sense constitutes a lower bound for SSSP, the running time of our approach always stayed within a factor of five, for the most difficult graph classes the difference was even less than a factor of two. We are not aware of any previous I/O-efficient implementation for the classic general SSSP in a (semi) external setting: in two recent projects [10, 23], Kumar/Schwabe-like SSSP approaches on graphs of at most 6 million vertices have been tested, forcing the authors to artificially restrict the main memory size, M, to rather unrealistic 4 to 16 MBytes in order not to leave the semi-external setting or produce huge running times for larger graphs: for random graphs of 220 vertices, the best previous approach needed over six hours. In contrast, for a similar ratio of input size vs. M, but on a 128 times larger and even sparser random graph, our approach was less than seven times slower, a relative gain of nearly 20. On a real-world 24 million node street graph, our implementation was over 40 times faster. Even larger gains of over 500 can be estimated for random line graphs based on previous experimental results for Munagala/Ranade-BFS. Finally, we also report on early results of experiments in which we replace the hard disk by a solid state disk (flash memory).

...read moreread less

38 citations

Proceedings Article•

Four-dimensional Hilbert curves for R-trees

[...]

Herman Haverkort¹, Freek van Walderveen¹•Institutions (1)

Eindhoven University of Technology¹

3 Jan 2009

TL;DR: By selecting a curve that has certain properties and choosing the right rotation, one can combine the strengths of the two-dimensional and the four-dimensional approach into one, while avoiding their apparent weaknesses.

...read moreread less

Abstract: Two-dimensional R-trees are a class of spatial index structures in which objects are arranged to enable fast window queries: report all objects that intersect a given query window. One of the most successful methods of arranging the objects in the index structure is based on sorting the objects according to the positions of their centres along a two-dimensional Hilbert space-filling curve. Alternatively one may use the coordinates of the objects' bounding boxes to represent each object by a four-dimensional point, and sort these points along a four-dimensional Hilbert-type curve. In experiments by Kamel and Faloutsos and by Arge et al. the first solution consistently outperformed the latter when applied to point data, while the latter solution clearly outperformed the first on certain artificial rectangle data. These authors did not specify which four-dimensional Hilbert-type curve was used; many exist. In this paper we show that the results of the previous papers can be explained by the choice of the four-dimensional Hilbert-type curve that was used and by the way it was rotated in four-dimensional space. By selecting a curve that has certain properties and choosing the right rotation one can combine the strengths of the two-dimensional and the four-dimensional approach into one, while avoiding their apparent weaknesses. The effectiveness of our approach is demonstrated with experiments on various data sets. For real data taken from VLSI design, our new curve yields R-trees with query times that are better than those of R-trees that were obtained with previously used curves.

...read moreread less

32 citations

Proceedings Article•

Theory and practise of monotone minimal perfect hashing

[...]

Djamal Belazzougui¹, Paolo Boldi², Rasmus Pagh³, Sebastiano Vigna²•Institutions (3)

École Normale Supérieure¹, University of Milan², University of Copenhagen³

3 Jan 2009

TL;DR: This work analyzes experimentally the data structures proposed in [1], and proposes some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage.

...read moreread less

Abstract: Minimal perfect hash functions have been shown to be useful to compress data in several data management tasks. In particular, order-preserving minimal perfect hash functions [12] have been used to retrieve the position of a key in a given list of keys: however, the ability to preserve any given order leads to an unavoidable Ω(n log n) lower bound on the number of bits required to store the function. Recently, it was observed [1] that very frequently the keys to be hashed are sorted in their intrinsic (i.e., lexicographical) order. This is typically the case of dictionaries of search engines, list of URLs of web graphs, etc. We refer to this restricted version of the problem as monotone minimal perfect hashing. We analyse experimentally the data structures proposed in [1], and along our way we propose some new methods that, albeit asymptotically equivalent or worse, perform very well in practise, and provide a balance between access speed, ease of construction, and space usage.

...read moreread less

29 citations

Book Chapter•10.1137/1.9781611972894.11•

Drawing binary tanglegrams: an experimental evaluation

[...]

Martin Nöllenburg¹, Markus Völker¹, Alexander Wolff², Danny Holten²•Institutions (2)

Karlsruhe Institute of Technology¹, Eindhoven University of Technology²

3 Jan 2009

TL;DR: The tanglegram layout problem is NP-hard even for complete binary trees, for general binary trees the problem is hard to approximate if the Unique Games Conjecture holds as discussed by the authors.

...read moreread less

Abstract: A tanglegram is a pair of trees whose leaf sets are in one-to-one correspondence; matching leaves are connected by inter-tree edges In applications such as phylogenetics or hierarchical clustering, it is required that the individual trees are drawn crossing-free A natural optimization problem, denoted tanglegram layout problem, is thus to minimize the number of crossings between inter-tree edges The tanglegram layout problem is NP-hard even for complete binary trees, for general binary trees the problem is hard to approximate if the Unique Games Conjecture holds In this paper we present an extensive experimental comparison of a new and several known heuristics for the general binary case We measure the performance of the heuristics with a simple integer linear program and a new exact branch-and-bound algorithm The new heuristic returns the first solution that the branch-and-bound algorithm computes (in quadratic time) Surprisingly, in most cases this simple heuristic is at least as good as the best of the other heuristics

...read moreread less

27 citations

Proceedings Article•

Quasirandom rumor spreading: an experimental analysis

[...]

Benjamin Doerr¹, Tobias Friedrich², Marvin Künnemann³, Thomas Sauerwald²•Institutions (3)

Max Planck Society¹, International Computer Science Institute², Saarland University³

3 Jan 2009

TL;DR: Two versions of the well-known “randomized rumor spreading” protocol are empirically analyzed to disseminate a piece of information in networks and it is shown that the quasirandom model generally is faster, but it also shows that the runtime is more concentrated around the mean.

...read moreread less

Abstract: We empirically analyze two versions of the well-known "randomized rumor spreading" protocol to disseminate a piece of information in networks. In the classical model, in each round each informed node informs a random neighbor. At SODA 2008, three of the authors proposed a quasirandom variant. Here, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list. While for sparse random graphs a better performance of the quasirandom model could be proven, all other results show that, independent of the structure of the lists, the same asymptotic performance guarantees hold as for the classical model. In this work, we compare the two models experimentally. This not only shows that the quasirandom model generally is faster (which was expected, though maybe not to this extent), but also that the runtime is more concentrated around the mean value (which is surprising given that much fewer random bits are used in the quasirandom process). These advantages are also observed in a lossy communication model, where each transmission does not reach its target with a certain probability, and in an asynchronous model, where nodes send at random times drawn from an exponential distribution. We also show that the particular structure of the lists has little influence on the efficiency. In particular, there is no problem if all nodes use an identical order to inform their neighbors.

...read moreread less

22 citations

Book Chapter•10.1137/1.9781611972894.16•

Randomized rounding in the presence of a cardinality constraint

[...]

Benjamin Doerr¹, Magnus Wahlström¹•Institutions (1)

Max Planck Society¹

3 Jan 2009

TL;DR: The authors' experiments show that adding a single cardinality constraint typically reduces the rounding errors and not seriously increases the running times, and in general, the derandomization of the tree-based approach is superior to the bitwise bitwise one, while the two randomized versions produce very similar rounding errors.

...read moreread less

Abstract: We regard the problem of generating randomized roundings with a single cardinality constraint. This is motivated by recent results of Srinivasan (FOCS 2001), Gandhi et al. (FOCS 2002, J. ACM 2006) and the first author (STACS 2005, STACS 2006). Our work results in (a) an improved version of the bitwise derandomization given by the first author, (b) the first derandomization of Srinivasan's tree-based randomized approach, together with a proof of its correctness, and (c) an experimental comparison of the resulting algorithms. Our experiments show that adding a single cardinality constraint typically reduces the rounding errors and not seriously increases the running times. In general, our derandomization of the tree-based approach is superior to the derandomized bitwise one, while the two randomized versions produce very similar rounding errors. When implementing the derandomized tree-based approach, however, the choice of the tree is important.

...read moreread less

20 citations

Proceedings Article•

Dealing with large hidden constants: engineering a planar Steiner tree PTAS

[...]

Siamak Tazari¹, Matthias Müller-Hannemann²•Institutions (2)

Technische Universität Darmstadt¹, Martin Luther University of Halle-Wittenberg²

3 Jan 2009

TL;DR: The first attempt on implementing a highly theoretical polynomial-time approximation scheme (PTAS) with huge hidden constants, namely, the PTAS for Steiner tree in planar graphs by Borradaile, Klein, and Mathieu is presented.

...read moreread less

Abstract: We present the first attempt on implementing a highly theoretical polynomial-time approximation scheme (PTAS) with huge hidden constants, namely, the PTAS for Steiner tree in planar graphs by Borradaile, Klein, and Mathieu (SODA 2007, WADS 2007). Whereas this result, and several other PTAS results of the recent years, are of high theoretical importance, no practical applications or even implementation attempts have been known to date, due to the extremely large constants that are involved in them. We describe techniques on how to circumvent the challenges in implementing such a scheme. Our main contribution is the engineering of several details of the original algorithm to make it work in practice. With today's limitations on processing power and space, we still have to sacrifice approximation guarantees for improved running times by choosing some parameters empirically. But our experiments show that with our choice of parameters, we do get the desired approximation ratios, suggesting that a much tighter analysis might be possible. Hence, we show that it is possible to actually implement and run this algorithm, even on large instances, already today -- but under some compromises. Further improvements, both in theory and practice, might make these great theoretical works finally bear practical fruits in the future. First computational experiments with benchmark instances from SteinLib and large artificial instances well exceeded our own expectations. We demonstrate that we are able to handle instances with up to a million nodes and several hundreds of terminals in 1.5 hours on a standard PC. On the rectilinear preprocessed instances from SteinLib, we observe a monotonous improvement for smaller values of e, with an average gap below 1% for e = 0.1. We compare our implementation against the well-known batched 1-Steiner heuristic and observe that on very large instances, we are able to produce comparable solutions much faster.

...read moreread less

11 citations

Proceedings Article•

Solving maximum flow problems on real world bipartite graphs

[...]

Cosmin Silvestru Negruşeri¹, Mircea Bogdan Paşoi², Barbara Stanley¹, Clifford Stein³, Cristian George Strat² - Show less +1 more•Institutions (3)

Google¹, University of Bucharest², Columbia University³

3 Jan 2009

TL;DR: In this paper, the authors present an experimental study of several push-relabel algorithms in the context of unbalanced bipartite networks and show how the two-edge push rule improves the running time.

...read moreread less

Abstract: In this paper we present an experimental study of several maximum flow algorithms in the context of unbalanced bipartite networks. Our experiments are motivated by a real world problem of managing reservation-based inventory in Google content ad systems. We are interested in observing the performance of several push-relabel algorithms on our real world data sets and also on some generated ones. Previous work suggested an important improvement for push-relabel algorithms on unbalanced bipartite networks: the two-edge push rule. We show how the two-edge push rule improves the running time. While no single algorithm dominates the results, we show there is one that has very robust performance in practice.

...read moreread less

8 citations

Proceedings Article•

The domination heuristic for LP-type problems

[...]

Taras Galkovskyi¹, Bernd Gärtner², Bogdan Rublev¹•Institutions (2)

Taras Shevchenko National University of Kyiv¹, ETH Zurich²

3 Jan 2009

TL;DR: A new speed-up heuristic that can easily be integrated into the known linear-time algorithms, without decreasing their worst-case performance, is introduced.

...read moreread less

Abstract: Certain geometric optimization problems, for example finding the smallest enclosing ellipse of a set of points, can be solved in linear time by simple randomized (or complicated deterministic) combinatorial algorithms. In practice, these algorithms are enhanced or replaced with heuristic variants that are faster but do not come with a theoretical runtime guarantee. In this paper, we introduce a new speed-up heuristic that can easily be integrated into the known linear-time algorithms, without decreasing their worst-case performance. The heuristic can actually be defined for every problem in the well-known abstract class of LP-type problems; its effectiveness in practice depends on whether and how fast the heuristic can be implemented for the specific problem at hand, and on whether the input distribution is favorable. We provide test results showing that for two concrete problems, the new heuristic may lead to significant speedups compared to state-of-the-art implementations that are available in the Computational Geometry Algorithms Library CGAL.

...read moreread less

Proceedings Article•

An experimental study of minimum mean cycle algorithms

[...]

Loukas Georgiadis¹, Andrew V. Goldberg², Robert E. Tarjan³, Renato F. Werneck²•Institutions (3)

Hewlett-Packard¹, Microsoft², Princeton University³

3 Jan 2009

TL;DR: In the experiments, the tree-based method and two implementations of the cycle- based method outperformed other approaches, including binary search.

...read moreread less

Abstract: We study algorithms for the minimum mean cycle problem, a parametric version of shortest path feasibility (SPF). The three basic approaches to the problem are cycle-based, binary search, and tree-based. The first two use an SPF algorithm as a subroutine, while the latter uses a parametric approach. When implementing the SPF-based methods, one has a choice of SPF algorithms and incremental optimization strategies. There are also several ways to handle precision issues. This leads to dozens of variants, which we systematically compare. Our experimental setup is more comprehensive than in previous studies. In our experiments, the tree-based method and two implementations of the cycle-based method outperformed other approaches, including binary search.

...read moreread less

Proceedings Article•

The Filter-Kruskal minimum spanning tree algorithm

[...]

Vitaly Osipov¹, Peter Sanders¹, Johannes Singler¹•Institutions (1)

Karlsruhe Institute of Technology¹

3 Jan 2009

TL;DR: A simple modification of Kruskal's algorithm that avoids sorting edges that are "obviously" not in the MST, and has very good practical performance over the entire range of edge densities.

...read moreread less

Abstract: We present Filter-Kruskal -- a simple modification of Kruskal's algorithm that avoids sorting edges that are "obviously" not in the MST. For arbitrary graphs with random edge weights Filter-Kruskal runs in time O(m + n log n log m/n), i.e. in linear time for not too sparse graphs. Experiments indicate that the algorithm has very good practical performance over the entire range of edge densities. An equally simple parallelization seems to be the currently best practical algorithm on multicore machines.

...read moreread less

Proceedings Article•

Experimental comparison of the two Fredman-Khachiyan-algorithms

[...]

Matthias Hagen¹, Peter Horatschek¹, Martin Mundhenk¹•Institutions (1)

University of Jena¹

3 Jan 2009

TL;DR: This paper experimentally shows algorithm B to be competitive and even superior to algorithm A on many instances, contrasting the assumption that the operations performed by algorithm A to ensure recursion on smaller sub-problems do only pay off theoretically.

...read moreread less

Abstract: We experimentally compare the two algorithms A and B by Fredman and Khachiyan [FK96] for the problem Monet---given two monotone Boolean formulas p in DNF and ψ in CNF, decide whether they are equivalent. Currently, algorithm B is the Monet algorithm with the best known worst-case performance. However, there is no experimental evaluation of its practical performance yet, mainly due to the following two reasons. Firstly, implementation of algorithm B is usually considered to be more involved than for algorithm A. Secondly and probably more importantly, there is the assumption that the operations performed by algorithm B to ensure recursion on smaller sub-problems do only pay off theoretically. In this paper, we contrast this assumption by experimentally showing algorithm B to be competitive and even superior to algorithm A on many instances.

...read moreread less

Proceedings Article•

Rank aggregation: together we're strong

[...]

Frans Schalekamp¹, Anke van Zuylen¹•Institutions (1)

Tsinghua University¹

3 Jan 2009

TL;DR: This work gives theoretical and practical evidence that a combination of these different approaches gives algorithms that are superior to the individual algorithms, and performs an extensive evaluation of the "pure" algorithms and combinations of different approaches.

...read moreread less

Abstract: We consider the problem of finding a ranking of a set of elements that is "closest to" a given set of input rankings of the elements; more precisely, we want to find a permutation that minimizes the Kendall-tau distance to the input rankings, where the Kendall-tau distance is defined as the sum over all input rankings of the number of pairs of elements that are in a different order in the input ranking than in the output ranking. If the input rankings are permutations, this problem is known as the Kemeny rank aggregation problem. This problem arises for example in building meta-search engines for Web search, aggregating viewers' rankings of movies, or giving recommendations to a user based on several different criteria, where we can think of having one ranking of the alternatives for each criterion. Many of the approximation algorithms and heuristics that have been proposed in the literature are either positional, comparison sort or local search algorithms. The rank aggregation problem is a special case of the (weighted) feedback arc set problem, but in the feedback arc set problem we use only information about the preferred relative ordering of pairs of elements to find a ranking of the elements, whereas in the case of the rank aggregation problem, we have additional information in the form of the complete input rankings. The positional methods are the only algorithms that use this additional information. Since the rank aggregation problem is NP-hard, none of these algorithms is guaranteed to find the optimal solution, and different algorithms will provide different solutions. We give theoretical and practical evidence that a combination of these different approaches gives algorithms that are superior to the individual algorithms. Theoretically, we give lower bounds on the performance for many of the "pure" methods. Practically, we perform an extensive evaluation of the "pure" algorithms and combinations of different approaches. We give three recommendations for which (combination of) methods to use based on whether a user wants to have a very fast, fast or reasonably fast algorithm.

...read moreread less