TL;DR: The adaptive cuckoo filter (ACF) as mentioned in this paper is a data structure for approximate set membership that extends Cuckoo filters by reacting to false positives, removing them for future queries.
Abstract: We introduce the adaptive cuckoo filter (ACF), a data structure for approximate set membership that extends cuckoo filters by reacting to false positives, removing them for future queries. As an example application, in packet processing queries may correspond to flow identifiers, so a search for an element is likely to be followed by repeated searches for that element. Removing false positives can therefore significantly lower the false-positive rate. The ACF, like the cuckoo filter, uses a cuckoo hash table to store fingerprints. We allow fingerprint entries to be changed in response to a false positive in a manner designed to minimize the effect on the performance of the filter. We show that the ACF is able to significantly reduce the false-positive rate by presenting both a theoretical model for the false-positive rate and simulations using both synthetic data sets and real packet traces.
TL;DR: In this article, a linear-time algorithm based on cluster contraction using label propagation and Padberg and Rinaldi contraction heuristics is proposed to compute near-minimum cuts.
Abstract: The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi’s contraction heuristics [SIAM Review, 1991]. We give both sequential and shared-memory parallel implementations of our algorithm. Extensive experiments on both real-world and generated instances show that our algorithm finds the optimal cut on nearly all instances significantly faster than other state-of-the-art exact algorithms, and our error rate is lower than that of other heuristic algorithms. In addition, our parallel algorithm runs a factor 7.5× faster on average when using 32 threads. To further speed up computations, we also give a version of our algorithm that performs random edge contractions as preprocessing. This version achieves a lower running time and better parallel scalability at the expense of a higher error rate.
TL;DR: This work gives a distributed memory parallel algorithm to compute high-quality edge partitions in a scalable way and has high quality on large real-world networks and large hyperbolic random graphs, which have a power law degree distribution and are therefore specifically targeted by edge partitioning.
Abstract: Edge-centric distributed computations have appeared as a recent technique to improve the shortcomings of think-like-a-vertex algorithms on large scale-free networks. In order to increase parallelism on this model, edge partitioning - partitioning edges into roughly equally sized blocks - has emerged as an alternative to traditional (node-based) graph partitioning. In this work, we give a distributed memory parallel algorithm to compute high-quality edge partitions in a scalable way. Our algorithm scales to networks with billions of edges, and runs efficiently on thousands of PEs. Our technique is based on a fast parallelization of split graph construction and a use of advanced node partitioning algorithms. Our extensive experiments show that our algorithm has high quality on large real-world networks and large hyperbolic random graphs, which have a power law degree distribution and are therefore specifically targeted by edge partitioning
TL;DR: This work proposes to use a simple framework (the augmented map) to model the range, segment and rectangle query problems, and develops both multi-level tree structures and sweepline algorithms supporting range, segments and rectangle queries in two dimensions.
Abstract: The range, segment and rectangle query problems are fundamental problems in computational geometry, and have extensive applications in many domains. Despite the significant theoretical work on these problems, efficient implementations can be complicated. We know of very few practical implementations of the algorithms in parallel, and most implementations do not have tight theoretical bounds. We focus on simple and efficient parallel algorithms and implementations for these queries, which have tight worst-case bound in theory and good parallel performance in practice. We propose to use a simple framework (the augmented map) to model the problem. Based on the augmented map interface, we develop both multi-level tree structures and sweepline algorithms supporting range, segment and rectangle queries in two dimensions. For the sweepline algorithms, we propose a parallel paradigm and show corresponding cost bounds. All of our data structures are work-efficient to build in theory and achieve a low parallel depth. The query time is almost linear to the output size.
We have implemented all the data structures described in the paper using a parallel augmented map library. Based on the library each data structure only requires about 100 lines of C++ code. We test their performance on large data sets (up to $10^8$ elements) and a machine with 72-cores (144 hyperthreads). The parallel construction achieves 32-68x speedup. Speedup numbers on queries are up to 126-fold. Our sequential implementation outperforms the CGAL library by at least 2x in both construction and queries. Our sequential implementation can be slightly slower than the R-tree in the Boost library in some cases (0.6-2.5x), but has significantly better query performance (1.6-1400x) than Boost.
TL;DR: In this paper, the authors prove that the problem is in FPT when parameterized by the number of paths by giving a practical linear fpt algorithm and evaluate its performance on RNA-sequence data.
Abstract: The Flow Decomposition problem, which asks for the smallest set of weighted paths that "covers" a flow on a DAG, has recently been used as an important computational step in transcript assembly. We prove the problem is in FPT when parameterized by the number of paths by giving a practical linear fpt algorithm. Further, we implement and engineer a Flow Decomposition solver based on this algorithm, and evaluate its performance on RNA-sequence data. Crucially, our solver finds exact solutions while achieving runtimes competitive with a state-of-the-art heuristic. Finally, we contextualize our design choices with two hardness results related to preprocessing and weight recovery. Specifically, $k$-Flow Decomposition does not admit polynomial kernels under standard complexity assumptions, and the related problem of assigning (known) weights to a given set of paths is NP-hard.
TL;DR: In this paper, the authors presented new practical sequential and parallel algorithms for wavelet tree construction, which can be adapted to wavelet matrices in the same (asymptotic) time, using only little extra space.
Abstract: The wavelet tree (Grossi et al. [SODA, 2003]) and wavelet matrix (Claude et al. [Inf. Syst., 47:15--32, 2015]) are compact indices for texts over an alphabet $[0,\sigma)$ that support rank, select and access queries in $O(\lg \sigma)$ time. We first present new practical sequential and parallel algorithms for wavelet tree construction. Their unifying characteristics is that they construct the wavelet tree bottomup}, i.e., they compute the last level first. We also show that this bottom-up construction can easily be adapted to wavelet matrices. In practice, our best sequential algorithm is up to twice as fast as the currently fastest sequential wavelet tree construction algorithm (Shun [DCC, 2015]), simultaneously saving a factor of 2 in space. This scales up to 32 cores, where we are about equally fast as the currently fastest parallel wavelet tree construction algorithm (Labeit et al. [DCC, 2016]), but still use only about 75 % of the space. An additional theoretical result shows how to adapt any wavelet tree construction algorithm to the wavelet matrix in the same (asymptotic) time, using only little extra space.
TL;DR: In this model, a sorting algorithm maintains an approximation to the sorted order of a list of data items while simultaneously, with each comparison made by the algorithm, an adversary randomly swaps the order of adjacent items in the true sorted order as mentioned in this paper.
Abstract: We empirically study sorting in the evolving data model. In this model, a sorting algorithm maintains an approximation to the sorted order of a list of data items while simultaneously, with each comparison made by the algorithm, an adversary randomly swaps the order of adjacent items in the true sorted order. Previous work studies only two versions of quicksort, and has a gap between the lower bound of Omega(n) and the best upper bound of O(n log log n). The experiments we perform in this paper provide empirical evidence that some quadratic-time algorithms such as insertion sort and bubble sort are asymptotically optimal for any constant rate of random swaps. In fact, these algorithms perform as well as or better than algorithms such as quicksort that are more efficient in the traditional algorithm analysis model.
TL;DR: This work presents different algorithms for generating graph embeddings into the hyperbolic plane that are well suited for greedy routing, and presents the best algorithm, which greatly improves upon the previous best known embedding, whose creation required substantial manual intervention.
Abstract: Greedy routing computes paths between nodes in a network by successively moving to the neighbor closest to the target with respect to coordinates given by an embedding into some metric space. Its advantage is that only local information is used for routing decisions. We present different algorithms for generating graph embeddings into the hyperbolic plane that are well suited for greedy routing. In particular, our embeddings guarantee that greedy routing always succeeds in reaching the target, and we try to minimize the lengths of the resulting greedy paths. We evaluate our algorithm on multiple generated and real-world networks. For networks that are generally assumed to have a hidden underlying hyperbolic geometry, such as the Internet graph [3], we achieve near-optimal results (i.e., the resulting greedy paths are only slightly longer than the corresponding shortest paths). In the case of the Internet graph, they are only 6% longer when using our best algorithm, which greatly improves upon the previous best known embedding, whose creation required substantial manual intervention. In addition to measuring the stretch, we empirically evaluate our algorithms regarding the size of the coordinates of the resulting embeddings and observe how it impacts the success rate when coordinates are not represented with very high precision. Since numerical difficulties are a major issue when performing computations in the hyperbolic plane, we consider variations of our algorithm that improve the success rate when using coordinates with lower precision.
TL;DR: In this article, the authors present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm, making it easy to perform a detailed comparison of the performance and behavior of different algorithmic ideas.
Abstract: The state-of-the-art tools for practical graph canonization are all based on the individualization-refinement paradigm, and their difference is primarily in the choice of heuristics they include and in the actual tool implementation. It is thus not possible to make a direct comparison of how individual algorithmic ideas affect the performance on different graph classes. We present an algorithmic software framework that facilitates implementation of heuristics as independent extensions to a common core algorithm. It therefore becomes easy to perform a detailed comparison of the performance and behavior of different algorithmic ideas. Implementations are provided of a range of algorithms for tree traversal, target cell selection, and node invariant, including choices from the literature and new variations. The framework readily supports extraction and visualization of detailed data from separate algorithm executions for subsequent analysis and development of new heuristics. Using collections of different graph classes, we investigate the effect of varying the selections of heuristics, often revealing exactly which individual algorithmic choice is responsible for particularly good or bad performance. On several benchmark collections, including a newly proposed class of difficult instances, we additionally find that our implementation performs better than the current state-of-the-art tools.
TL;DR: In this paper, a two-pivot variant of Lomuto's partitioning scheme is proposed to reduce the average sorting cost of the original two pivot quicksort algorithm.
Abstract: This paper presents simple variants of the BlockQuicksort algorithm described by Edelkamp and Weiss (ESA 2016). The simplification is achieved by using Lomuto's partitioning scheme instead of Hoare's crossing pointer technique to partition the input. To achieve a robust sorting algorithm that works well on many different input types, the paper introduces a novel two-pivot variant of Lomuto's partitioning scheme. A surprisingly simple twist to the generic two-pivot quicksort approach makes the algorithm robust. The paper provides an analysis of the theoretical properties of the proposed algorithms and compares them to their competitors. The analysis shows that Lomuto-based approaches incur a higher average sorting cost than the Hoare-based approach of BlockQuicksort. Moreover, the analysis is particularly useful to reason about pivot choices that suit the two-pivot approach. An extensive experimental study shows that, despite their worse theoretical behavior, the simpler variants perform as well as the original version of BlockQuicksort.
TL;DR: This work develops an algorithm for enumerating all pairwise non-isomorphic 1-face embeddings of graphs based on a novel canonical form invariant for ASTs, gap vector, and provides a linear-time isomorphism test forASTs and thus, also for orientable 1- Faces of graphs.
Abstract: Antiparallel strong traces (ASTs) are a type of walks in graphs which use every edge exactly twice. They correspond to 1-face embeddings in orientable surfaces and can be used to design self-assembling protein or DNA strands. Based on a novel canonical form invariant for ASTs, gap vector, we provide a linear-time isomorphism test for ASTs and thus, also for orientable 1-face embeddings of graphs. Using the canonical form, we develop an algorithm for enumerating all pairwise non-isomorphic 1-face embeddings of graphs. We compare our algorithm with an independent implementation of a recent algebraic approach (Basi? et al., MATCH Commun. Math. Comput. Chem. 78 (3), 2017) on large data sets. Our results yield the first large-scale enumeration of non-isomorphic embeddings and investigation of their properties.
TL;DR: A new binary linear program with polynomial size based on partial-ordering is introduced, which for the first time solves all STPRBH instances from the DIMACS benchmark set to optimality.
Abstract: The Steiner tree problem with revenues, budgets and hop constraints (STPRBH) is a variant of the classical Steiner tree problem. This problem asks for a subtree in a given graph with maximum revenues corresponding to its nodes, where its total edge costs respect the given budget, and the number of edges between each node and its root does not exceed the hop limit. We introduce a new binary linear program with polynomial size based on partial-ordering, which (up to our knowledge) for the first time solves all STPRBH instances from the DIMACS benchmark set to optimality. The set contains graphs with up to 500 nodes and 12500 edges.
TL;DR: Agarwal et al. as discussed by the authors showed that the Euler tour tree, an existing sequential dynamic trees data structure, can be parallelized in the batch setting, achieving a self-relative speedup of 67-96x on 72 cores with hyper-threading on large batch sizes.
Abstract: The dynamic trees problem is to maintain a forest undergoing edge insertions and deletions while supporting queries for information such as connectivity. There are many existing data structures for this problem, but few of them are capable of exploiting parallelism in the batch-setting, in which large batches of edges are inserted or deleted from the forest at once. In this paper, we demonstrate that the Euler tour tree, an existing sequential dynamic trees data structure, can be parallelized in the batch setting. For a batch of $k$ updates over a forest of $n$ vertices, our parallel Euler tour trees perform $O(k \log (1 + n/k))$ expected work with $O(\log n)$ depth with high probability. Our work bound is asymptotically optimal, and we improve on the depth bound achieved by Acar et al. for the batch-parallel dynamic trees problem.
The main building block for parallelizing Euler tour trees is a batch-parallel skip list data structure, which we believe may be of independent interest. Euler tour trees require a sequence data structure capable of joins and splits. Sequentially, balanced binary trees are used, but they are difficult to join or split in parallel. We show that skip lists, on the other hand, support batches of joins or splits of size $k$ over $n$ elements with $O(k \log (1 + n/k))$ work in expectation and $O(\log n)$ depth with high probability. We also achieve the same efficiency bounds for augmented skip lists, which allows us to augment our Euler tour trees to support subtree queries.
Our data structures achieve between 67--96x self-relative speedup on 72 cores with hyper-threading on large batch sizes. Our data structures also outperform the fastest existing sequential dynamic trees data structures empirically.