TL;DR: A novel distributed evolutionary algorithm, KaFFPaE, is presented, to solve the Graph Partitioning Problem, which makes use of KaFF Pa (Karlsruhe Fast Flow Partitioner), which provides new effective crossover and mutation operators.
Abstract: We present a novel distributed evolutionary algorithm, KaFFPaE, to solve the Graph Partitioning Problem, which makes use of KaFFPa (Karlsruhe Fast Flow Partitioner). The use of our multilevel graph partitioner KaFFPa provides new effective crossover and mutation operators. By combining these with a scalable communication protocol we obtain a system that is able to improve the best known partitioning results for many inputs in a very short amount of time. For example, in Walshaw's well known benchmark tables we are able to improve or recompute 76% of entries for the tables with 1%, 3% and 5% imbalance.
TL;DR: This work introduces RAPTOR, a novel round-based public transit router that computes all Pareto-optimal journeys between two random locations an order of magnitude faster than previous approaches, which easily enables interactive applications.
Abstract: We study the problem of computing all Pareto-optimal journeys in a dynamic public transit network for two criteria: arrival time and number of transfers. Existing algorithms consider this as a graph problem, and solve it using variants of Dijkstra's algorithm. Unfortunately, this leads to either high query times or suboptimal solutions. We take a different approach. We introduce RAPTOR, our novel round-based public transit router. Unlike previous algorithms, it is not Dijkstra-based, looks at each route (such as a bus line) in the network at most once per round, and can be made even faster with simple pruning rules and parallelization using multiple cores. Because it does not rely on preprocessing, RAPTOR works in fully dynamic scenarios. Moreover, it can be easily extended to handle flexible departure times or arbitrary additional criteria, such as fare zones. When run on London's complex public transportation network, RAPTOR computes all Pareto-optimal journeys between two random locations an order of magnitude faster than previous approaches, which easily enables interactive applications.
TL;DR: A new streaming approximation algorithm for computing Hierarchical Heavy Hitters that improves on the worst-case time and space bounds of earlier algorithms, is conceptually simple and substantially easier to implement, offers improved accuracy guarantees, and can be efficiently implemented in commodity hardware such as ternary content addressable memory (TCAMs).
Abstract: The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming approximation algorithm for computing Hierarchical Heavy Hitters that has several advantages over previous algorithms. It improves on the worst-case time and space bounds of earlier algorithms, is conceptually simple and substantially easier to implement, offers improved accuracy guarantees, is easily adopted to a distributed or parallel setting, and can be efficiently implemented in commodity hardware such as ternary content addressable memory (TCAMs). We present experimental results showing that for parameters of primary practical interest, our two-dimensional algorithm is superior to existing algorithms in terms of speed and accuracy, and competitive in terms of space, while our one-dimensional algorithm is also superior in terms of speed and accuracy for a more limited range of parameters.
TL;DR: In this article, the authors explore new succinct representations of path-decomposed tries and experimentally evaluate the corresponding reduction in space usage and memory latency, comparing with the state of the art.
Abstract: Tries are popular data structures for storing a set of strings, where common prefixes are represented by common root-to-node paths. Over fifty years of usage have produced many variants and implementations to overcome some of their limitations. We explore new succinct representations of path-decomposed tries and experimentally evaluate the corresponding reduction in space usage and memory latency, comparing with the state of the art. We study two cases of applications: (1) a compressed dictionary for (compressed) strings, and (2) a monotone minimal perfect hash for strings that preserves their lexicographic order.
For (1), we obtain data structures that outperform other state-of-the-art compressed dictionaries in space efficiency, while obtaining predictable query times that are competitive with data structures preferred by the practitioners. In (2), our tries perform several times faster than other trie-based monotone perfect hash functions, while occupying nearly the same space.
TL;DR: By carefully adapting node contraction, a common ingredient to many speedup techniques on road networks, this work is able to compute point-to-point queries on a continental network combined of cars, rail-roads and flights several orders of magnitude faster than Dijkstra's algorithm.
Abstract: In the multi-modal route planning problem we are given multiple transportation networks (e. g., pedestrian, road, public transit) and ask for a best integrated journey between two points. The main challenge is that a seemingly optimal journey may have changes between networks that do not reflect the user's modal preferences. In fact, quickly computing reasonable multi-modal routes remains a challenging problem: Previous approaches either suffer from poor query performance or their available choices of modal preferences during query time is limited. In this work we focus on computing exact multi-modal journeys that can be restricted by specifying arbitrary modal sequences at query time. For example, a user can say whether he wants to only use public transit, or also prefers to use a taxi or walking at the beginning or end of the journey; or if he has no restrictions at all. By carefully adapting node contraction, a common ingredient to many speedup techniques on road networks, we are able to compute point-to-point queries on a continental network combined of cars, rail-roads and flights several orders of magnitude faster than Dijkstra's algorithm. Thereby, we require little space overhead and obtain fast preprocessing times.
TL;DR: A novel exact algorithm for the minimum graph bisection problem, whose goal is to partition a graph into two equally-sized cells while minimizing the number of edges between them, based on the branch-and-bound framework.
Abstract: We present a novel exact algorithm for the minimum graph bisection problem, whose goal is to partition a graph into two equally-sized cells while minimizing the number of edges between them. Our algorithm is based on the branch-and-bound framework and, un-like most previous approaches, it is fully combinatorial. We present stronger lower bounds, improved branching rules, and a new decomposition technique that contracts entire regions of the graph without losing optimality guarantees. In practice, our algorithm works particularly well on instances with relatively small minimum bisections, solving large real-world graphs (with tens of thousands to millions of vertices) to optimality.
TL;DR: This paper reports on the ongoing effort to develop a general purpose software tool designed to solve MSO-definable optimization and decision problems on graphs of small treewidth and presents experimental results, which indicate that for some natural optimization problems MSO based approaches might be a suitable alternative to ILP solvers.
Abstract: A fundamental theorem of Courcelle states that every problem definable in Monadic Second-Order Logic (MSO) is solvable in linear time on graphs of bounded treewidth. In this paper, we report on our ongoing effort to develop a general purpose software tool designed to solve MSO-definable optimization and decision problems on graphs of small treewidth. We discuss the theoretical underpinnings of our tool and present experimental results, which indicate that for some natural optimization problems MSO based approaches might be a suitable alternative to ILP solvers.
TL;DR: This work enhances the map of known order constraints by proving an extended version of a constraint that has been conjectured by Mondal and Sen more than a decade ago by proving the inuence of different kinds order constraints on the performance of exact algorithms is systematically evaluated.
Abstract: We consider the problem of scheduling jobs on a single machine. Given a quadratic cost function, we aim to compute a schedule minimizing the weighted total cost, where the cost of each job is defined as the cost function value at the job's completion time. Throughout the past decades, great effort has been made to develop fast exact algorithms for the case of quadratic costs. The efficiency of these methods heavily depends on the utilization of structural properties of optimal schedules such as order constraints, i.e., sufficient conditions for pairs of jobs to appear in a certain order. A considerable number of different kinds of such constraints have been proposed. In this work we enhance the map of known order constraints by proving an extended version of a constraint that has been conjectured by Mondal and Sen more than a decade ago.
Besides proving this conjecture, our main contribution is an extensive experimental study where the inuence of different kinds order constraints on the performance of exact algorithms is systematically evaluated. In addition to a best-first graph search algorithm, we test a Quadratic Integer Programming formulation that admits to add order constraints as additional linear constraints. We also evaluate the optimality gap of well known Smith's rule for different monomial cost functions. Our experiments are based on sets of problem instances that have been generated using a new method which allows us to adjust a certain degree of difficulty of the instances.
TL;DR: Two challenging problems that arise in the context of computing a consensus of a collection of multilabeled trees are considered, namely (1) selecting a compatible collection of clusters on a multiset from an ordered list of such clusters and (2) optimally refining high degree vertices in a multILabeled tree.
Abstract: In this paper we consider two challenging problems that arise in the context of computing a consensus of a collection of multilabeled trees, namely (1) selecting a compatible collection of clusters on a multiset from an ordered list of such clusters and (2) optimally refining high degree vertices in a multilabeled tree. Forming such a consensus is part of an approach to reconstruct the evolutionary history of a set of species for which events such as genome duplication and hybridization have occurred in the past. We present exact algorithms for solving (1) and (2) that have an exponential run-time in the worst case. To give some impression of their performance in practice, we apply them to simulated input and to a real biological data set highlighting the impact of several structural properties of the input on the performance.
TL;DR: This work reconsiders the concept of transit nodes as introduced by Bast et al. and for the first time construct instance based lower bounds on the size of transit node sets by interpreting a LP formulation of the problem and its dual.
Abstract: We reconsider the concept of transit nodes as introduced by Bast et al. [3] and for the first time construct instance based lower bounds on the size of transit node sets by interpreting a LP formulation of the problem and its dual. As a side product we achieve considerably smaller access node sets which directly inuences the query time for non-local queries.
TL;DR: This work presents an approach that combines the best of both worlds of route planning on mobile devices: the server performs the route computation but, instead of sending only the route to the user, it sends a corridor that is robust against deviations.
Abstract: We study the problem of route planning on mobile devices. There are two current approaches to this problem. One option is to have all the routing data on the device, which can then compute routes by itself. This makes it hard to incorporate traffic updates, leading to suboptimal routes. An alternative approach outsources the route computation to a server, which then sends only the route to the device. The downside is that a user is lost when deviating from the proposed route in an area with limited connectivity. In this work, we present an approach that combines the best of both worlds. The server performs the route computation but, instead of sending only the route to the user, it sends a corridor that is robust against deviations. We define these corridors properly and show that their size can be theoretically bounded in road networks. We evaluate their quality experimentally in terms of size and robustness on a continental road network. Finally, we introduce several algorithms to compute corridors efficiently. Our experimental analysis shows that our corridors are small but very robust against deviations, and can be computed quickly on a standard server.
TL;DR: This study documents that a greedy strategy based on local movement is superior to one based on merging, and reveals that the former approach generally outperforms alternative setups and reference algorithms from the literature in terms of its own objective.
Abstract: Clustering a graph means identifying internally dense subgraphs which are only sparsely interconnected. Formalizations of this notion lead to measures that quantify the quality of a clustering and to algorithms that actually find clusterings. Since, most generally, corresponding optimization problems are hard, heuristic clustering algorithms are used in practice, or other approaches which are not based on an objective function. In this work we conduct a comprehensive experimental evaluation of the qualitative behavior of greedy bottom-up heuristics driven by cut-based objectives and constrained by intracluster density, using both real-world data and artificial instances. Our study documents that a greedy strategy based on local movement is superior to one based on merging. We further reveal that the former approach generally outperforms alternative setups and reference algorithms from the literature in terms of its own objective, while a modularity-based algorithm competes surprisingly well. Finally, we exhibit which combinations of cut-based inter- and intracluster measures are suitable for identifying a hidden reference clustering in synthetic random graphs. Our results serve as a guideline to the usage of bicriterial, cut-based measures for graph clusterings.
TL;DR: It is shown for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time and an alternative flow-based ILP formulation of polynomial size is proposed, whose structure is particularly favorable for a Lagrangian relaxation approach.
Abstract: A string cover C of a set of strings S is a set of substrings from S such that every string in S can be written as a concatenation of the strings in C. Given costs assigned to each substring from S, the Minimum String Cover (MSC) problem asks for a cover of minimum total cost. This NP-hard problem has so far only been approached from a purely theoretical perspective. A previous integer linear programming (ILP) formulation was designed for a special case, in which each string in S must be generated by a (small) constant number of substrings. If this restriction is removed, the ILP has an exponential number of variables, for which we show the pricing problem to be NP-hard. We propose an alternative flow-based ILP formulation of polynomial size, whose structure is particularly favorable for a Lagrangian relaxation approach. By making use of the strong bounds obtained through a repeated shortest path computation in a branch-and-bound manner, we show for the first time that non-trivial MSC instances can be solved to provable optimality in reasonable time. We also provide and solve real-world instances derived from the classic text "Alice in Wonderland". On almost all instances, our Lagrangian relaxation approach outperforms a CPLEX-based implementation by an order of magnitude. Our software is available under the terms of the GNU general public license.
TL;DR: It is proved that there is an O(n3/2 min-cost flow algorithm for networks that, after removing one node, are planar, have bounded degrees, and have bounded capacities.
Abstract: Motivated by an application in image processing, we introduce the grid-leveling problem. It turns out to be the dual of a minimum cost flow problem for an apex graph with a grid graph as its basis. We present an O(n3/2) algorithm for this problem. The optimum solution recovers missing DC coefficients from image and video coding by Discrete Cosine Transform used in popular standards like JPEG and MPEG. Generally, we prove that there is an O(n3/2 min-cost flow algorithm for networks that, after removing one node, are planar, have bounded degrees, and have bounded capacities. The costs may be arbitrary.
TL;DR: Empirical studies show that the algorithms considered are both efficient and practical on actual simulated and biological networks, and that the clique covers obtained on real networks yield biological insights.
Abstract: We consider the problem of edge clique cover on sparse networks and study an application to the identification of overlapping protein complexes for a network of binary protein-protein interactions. We first give an algorithm whose running time is linear in the size of the graph, provided the treewidth is bounded. We then provide an algorithm for planar graphs with bounded branchwidth upon which we build a PTAS for planar graphs. Empirical studies show that our algorithms are both efficient and practical on actual simulated and biological networks, and that the clique covers obtained on real networks yield biological insights.