TL;DR: In this paper, the authors introduce a new set system induced by the minimality condition of the hitting sets, which enables them to use efficient pruning methods and construct time efficient and polynomial space dualization algorithms.
Abstract: A hypergraph F is a set family defined on vertex set V. The dual of F is the set of minimal subsets H of V such that F ∩ H ≠ p for any F ∈ F. The computation of the dual is equivalent to many problems, such as minimal hitting set enumeration of a subset family, minimal set cover enumeration, and the enumeration of hypergraph transversals. In this paper, we introduce a new set system induced by the minimality condition of the hitting sets, that enables us to use efficient pruning methods. We further propose an efficient algorithm for checking the minimality, that enables us to construct time efficient and polynomial space dualization algorithms. The computational experiments show that our algorithms are quite fast even for large-scale input for which existing algorithms do not terminate in practical time.
TL;DR: Simple and fast algorithms for computing the LZ77 factorization are described, which consistently outperform all previous approaches in practice, use less memory, and still offer strong worstcase performance guarantees.
Abstract: For decades the Lempel-Ziv (LZ77) factorization has been a cornerstone of data compression and string processing algorithms, and uses for it are still being uncovered. For example, LZ77 is central to several recent text indexing data structures designed to search highly repetitive collections. However, in many applications computation of the factorization remains a bottleneck in practice. In this paper we describe simple and fast algorithms for computing the LZ77 factorization. These new methods consistently outperform all previous approaches in practice, use less memory, and still offer strong worstcase performance guarantees. A common feature of the new algorithms is their avoidance of the longest-common-prefix array, essential to nearly all prior art.
TL;DR: In this paper, the authors consider multicriteria shortest path problems and show that contraction hierarchies can be constructed efficiently for the case of arbitrary conic combinations of the edge costs.
Abstract: We consider multicriteria shortest path problems and show that contraction hierarchies -- a very powerful speed-up technique originally developed for standard shortest path queries in [7] -- can be constructed efficiently for the case of arbitrary conic combinations of the edge costs. This extends previous results in [5] which considered only the bicriteria case and discrete weights for the objective functions. On the theory side we prove a polynomial time bound for determining whether a path π is part of the lower envelope of all pareto-optimal paths via some polyhedral arguments. Experiments complement these results by showing the practicability of our approach.
TL;DR: Specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, are used to design very fast string matching algorithms in the case of short patterns.
Abstract: Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. In this model an algorithm operates on words of length w, grouping blocks of characters, and arithmetic and logic operations on the words take one unit of time.
In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns. From our experimental results it turns out that, despite their quadratic worst case time complexity, the new presented algorithms become the clear winners on the average for short patterns, when compared against the most Effective algorithms known in literature.
TL;DR: This paper uses the new tool MaLiJAn to confirm that asymmetric pivot choices are preferable to symmetric ones for this Quicksort variant asymptotically for combinatorial cost measures such as the total number of executed instructions.
Abstract: Recent results on Java 7's dual pivot Quicksort have revealed its highly asymmetric nature. These insights suggest that asymmetric pivot choices are preferable to symmetric ones for this Quicksort variant. From a theoretical point of view, this should allow us to improve on the current implementation in Oracle's Java 7 runtime library. In this paper, we use our new tool MaLiJAn to confirm this asymptotically for combinatorial cost measures such as the total number of executed instructions. However, the observed running times show converse behavior. With the support of data provided by MaLiJAn we are able to identify the profiling capabilities of Oracle's just-in-time compiler to be responsible for this unexpected outcome.
TL;DR: This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs).
Abstract: This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs). Our scheme combines three key techniques: (1) a new index data structure (Entropy-Coded Tries); (2) the use of sorting as the main data manipulation method; and (3) support for incremental index construction for dynamic datasets. We evaluate our scheme by building an external dictionary on flash-based drives and demonstrate our scheme's high performance, compactness, and practicality.
TL;DR: In this paper, the authors propose the VAT model (virtual address translation) to account for the cost of address translations and analyze the algorithms mentioned above and others in the model, and also analyze the VAT-cost of cache-oblivious algorithms.
Abstract: Modern computers are not random access machines (RAMs). They have a memory hierarchy, multiple cores, and virtual memory. In this paper, we address the computational cost of address translation in virtual memory. Starting point for our work is the observation that the analysis of some simple algorithms (random scan of an array, binary search, heapsort) in either the RAM model or the EM model (external memory model) does not correctly predict growth rates of actual running times. We propose the VAT model (virtual address translation) to account for the cost of address translations and analyze the algorithms mentioned above and others in the model. The predictions agree with the measurements. We also analyze the VAT-cost of cache-oblivious algorithms.
TL;DR: In this article, the authors study the combinatorial and algebraic complexity of normal surfaces from both the theoretical and experimental viewpoints and obtain new exponential lower bounds on the worst-case complexities in a variety of settings that are important for practical computation.
Abstract: In three-dimensional computational topology, the theory of normal surfaces is a tool of great theoretical and practical significance. Although this theory typically leads to exponential time algorithms, very little is known about how these algorithms perform in "typical" scenarios, or how far the best known theoretical bounds are from the real worst-case scenarios. Here we study the combinatorial and algebraic complexity of normal surfaces from both the theoretical and experimental viewpoints. Theoretically, we obtain new exponential lower bounds on the worst-case complexities in a variety of settings that are important for practical computation. Experimentally, we study the worst-case and average-case complexities over a comprehensive body of roughly three billion input triangulations. Many of our lower bounds are the first known exponential lower bounds in these settings, and experimental evidence suggests that many of our theoretical lower bounds on worst-case growth rates may indeed be asymptotically tight.
TL;DR: In this article, a cycle separator with a worst-case guarantee on the cycle length is presented, which is the first implementation of such a cyclar separator algorithm with a cycle length guarantee.
Abstract: We provide an implementation of an algorithm that, given a triangulated planar graph with m edges, returns a simple cycle that is a 2/3--balanced separator consisting of at most √8m edges. An Efficient construction of a short and balanced separator that forms a simple cycle is essential in numerous planar graph algorithms, e.g., for computing shortest paths, minimum cuts, or maximum flows. To the best of our knowledge, this is the first implementation of such a cycle separator algorithm with a worst-case guarantee on the cycle length.
We evaluate the performance of our algorithm and compare it to the planar separator algorithms recently studied by Holzer et al. [ESA 2005, ACM Journal of Experimental Algorithms 2009]. Out of these algorithms, only the Fundamental Cycle Separator (FCS) produces a simple cycle separator. However, FCS does not provide a worst-case size guarantee. We demonstrate that (i) our algorithm is competitive across all test cases in terms of running time, balance and cycle length, (ii) it provides worst-case guarantees on the cycle length, significantly outperforming FCS on some instances, and (iii) it scales to large graphs.
TL;DR: This work describes an extension of the kinetic data structures framework from Delaunay triangulations to fixed-radius alpha complexes and reports on several techniques to accelerate the computation that turn the implementation applicable to the underlying biological problem.
Abstract: Motivated by an application in cell biology, we describe an extension of the kinetic data structures framework from Delaunay triangulations to fixed-radius alpha complexes. Our algorithm is implemented using Cgal, following the exact geometric computation paradigm. We report on several techniques to accelerate the computation that turn our implementation applicable to the underlying biological problem.
TL;DR: This work presents a surprisingly simple method for "consistent" parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input.
Abstract: We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for "consistent" parallel processing of sparse outer products (column-row vector products) over several processors, in a communication-avoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and inter-processor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix.
As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}-integer matrix multiplication by the column-row method. Experimental evaluation of the proposed method on real-life data supports the theoretical findings.
TL;DR: This work introduces the Cov-MECF framework, a special case of minimum-edge cost flow in which the input graph is bipartite, and introduces a new heuristic LPO for any problem in this framework, empirically establishing that this heuristic returns solutions that are higher in quality than those of Wolsey's algorithm.
Abstract: In this work, we introduce the Cov-MECF framework, a special case of minimum-edge cost flow in which the input graph is bipartite. We observe that several important covering (and multi-covering) problems are captured in this unifying model and introduce a new heuristic LPO for any problem in this framework. The essence of LPO harnesses as an oracle the fractional solution in deciding how to greedily modify the partial solution. We empirically establish that this heuristic returns solutions that are higher in quality than those of Wolsey's algorithm. We also apply the analogs of Leskovec et. al.'s [25] optimization to LPO and introduce a further freezing optimization to both algorithms. We observe that the former optimization generally benefits LPO more than Wolsey's algorithm, and that the additional freezing step often corrects suboptimalities while further reducing the number of subroutine calls. We tested these implementations on randomly generated testbeds, several instances from the Second DIMACS Implementation Challenge and a couple networks modeling realworld dynamics.
TL;DR: Evaluated performance and limitations in practical computations of those gossip-based aggregation algorithms with the most promising theoretical fault tolerance properties show that for some failure types (such as permanent node failures) further algorithmic advances are required to achieve resilience with a reasonably small overhead and acceptable performance.
Abstract: Over the last years, several gossip-based aggregation algorithms have been developed which focus on providing resilience in failure-prone distributed systems. The main objective of such algorithms is the efficient in-network computation of aggregates even in the case when system failures occur during runtime. In this paper, we evaluate performance and limitations in practical computations of those gossip-based aggregation algorithms with the most promising theoretical fault tolerance properties.
Theoretical analyses of these algorithms usually address only the principal ability of handling or overcoming a certain kind of system failure. Most of the time, there are no formal results on the concrete impact of failure handling on the performance of the algorithms, e. g., in terms of convergence speed. This leaves a wide gap between theory and practice, as we illustrate in this paper. In order to bridge this gap, we first categorize common system failures of interest. Then, we experimentally investigate how well these common failure types are handled in practice by the considered algorithms and up to which extent these state-of-the-art methods provide a reasonable degree of fault tolerance in practice. Our experimental studies reveal (i) that certain failure handling approaches which work in theory exhibit unacceptable performance in practice and (ii) that in some cases the failure handling mechanisms used introduce new problems, e. g., numerical inaccuracy.
Our investigations illustrate that for some failure types (such as permanent node failures) further algorithmic advances are required to achieve resilience with a reasonably small overhead and acceptable performance.
TL;DR: In this article, an EM suffix sorter for text index construction in external memory (EM) was proposed. But it is not suitable for text size of 80 GiB using only 4 GiB of RAM in their experiments.
Abstract: We consider text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory. Practical tests show that this outperforms the previous best EM suffix sorter [Dementiev et al., ALENEX 2005] by a factor of about two in time and I/O-volume. Our second contribution is to augment the first algorithm to also construct the array of longest common prefixes (LCPs). This yields the first EM construction algorithm for LCP arrays. The overhead in time and I/O volume for this extended algorithm over plain suffix array construction is roughly two. Our algorithms scale far beyond problem sizes previously considered in the literature (text size of 80 GiB using only 4 GiB of RAM in our experiments).
TL;DR: This paper abstracts a set of steps for building an e0-sampler, based on sampling, recovery and selection, and shows how prior constructions of e 0-samplers can all be expressed in terms of these steps.
Abstract: The problem of building an e0-sampler is to sample near-uniformly from the support set of a dynamic multiset. This problem has a variety of applications within data analysis, computational geometry and graph algorithms. In this paper, we abstract a set of steps for building an e0-sampler, based on sampling, recovery and selection. We analyze the implementation of an e0-sampler within this framework, and show how prior constructions of e0-samplers can all be expressed in terms of these steps. Our experimental contribution is to provide a first detailed study of the accuracy and computational cost of e0-samplers.