TL;DR: A framework for unbiased approximation of betweenness is proposed that generalizes a previous approach by Brandes and yields significantly better approximation than before for many real world inputs and good approximations for the betweenness of unimportant nodes.
Abstract: Estimating the importance or centrality of the nodes in large networks has recently attracted increased interest. Betweenness is one of the most important centrality indices, which basically counts the number of shortest paths going through a node. Betweenness has been used in diverse applications, e.g., social network analysis or route planning. Since exact computation is prohibitive for large networks, approximation algorithms are important. In this paper, we propose a framework for unbiased approximation of betweenness that generalizes a previous approach by Brandes. Our best new schemes yield significantly better approximation than before for many real world inputs. In particular, we also get good approximations for the betweenness of unimportant nodes.
TL;DR: This work presents the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text.
Abstract: We present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text The system uses a carefully choreographed combination of classical data compression techniques and inverted index based search data structures It outperforms suffix array based techniques for all the above operations for real world (natural language) texts
TL;DR: In this article, the authors formulate and study the airspace sectorization problem from an algorithmic point of view, modeling the problem of optimal sectorization as a geometric partition problem with constraints.
Abstract: The National Airspace System (NAS) is designed to accommodate a large number of flights over North America. For purposes of workload limitations for air traffic controllers, the airspace is partitioned into approximately 600 sectors; each sector is observed by one or more controllers. In order to satisfy workload limitations for controllers, it is important that sectors be designed carefully according to the traffic patterns of flights, so that no sector becomes overloaded. We formulate and study the airspace sectorization problem from an algorithmic point of view, modeling the problem of optimal sectorization as a geometric partition problem with constraints. The novelty of the problem is that it partitions data consisting of trajectories of moving points, rather than static point set partitioning that is commonly studied. First, we formulate and solve the 1d version of the problem, showing how to partition a line into "sectors" (intervals) according to historical trajectory data. Then, we apply the 1D solution framework to design a 2D sectorization heuristic based on binary space partitions. We also devise partitions based on balanced "pie partitions" of a convex polygon.
We evaluate our 2D algorithms experimentally. We conduct experiments using actual historical flight track data for the NAS as the basis of our partitioning. We compare the workload balance of our methods to that of the existing set of sectors for the NAS and find that our resectorization yields competitive and improved workload balancing. In particular, our methods yield an improvement by a factor between 2 and 3 over the current sectorization in terms of the time-average and the worst-case workloads of the maximum workload sector. An even better improvement is seen in the standard deviations (over all sectors) of both time-average and worst-case workloads.
TL;DR: The results show that the exact ILP-based algorithm presented here has the advantage of proving the optimality of the computed solution, but also often outperforms the metaheuristic approaches in terms of running time.
Abstract: Given an undirected graph G = (V,E) with edge weights and a positive integer number k, the k-Cardinality Tree problem consists of finding a subtree T of G with exactly k edges and the minimum possible weight. Many algorithms have been proposed to solve this NP-hard problem, resulting in mainly heuristic and metaheuristic approaches.
In this paper we present an exact ILP-based algorithm using directed cuts. We mathematically compare the strength of our formulation to the previously known ILP formulations of this problem, and give an extensive study on the algorithm's practical performance compared to the state-of-the-art metaheuristics.
In contrast to the widespread assumption that such a problem cannot be efficiently tackled by exact algorithms for medium and large graphs (between 200 and 5000 nodes), our results show that our algorithm not only has the advantage of proving the optimality of the computed solution, but also often outperforms the metaheuristic approaches in terms of running time.
TL;DR: In this paper, the authors re-examine many of these existing algorithms and develop some new techniques for solving FAS, which are tested on both synthetic and Rank Aggregation-based datasets.
Abstract: Ranking data is a fundamental organizational activity. Given advice, we may wish to rank a set of items to satisfy as much of that advice as possible. In the Feedback Arc Set (FAS) problem, advice takes the form of pairwise ordering statements, 'a should be ranked before b'. Instances in which there is advice about every pair of items is known as a tournament. This task is equivalent to ordering the nodes of a given directed graph to minimize the number of arcs pointing in one direction.
In the past, much work focused on finding good, effective heuristics for solving the problem. Recently, a proof of the NP-completeness of the problem (even when restricted to tournaments) has accompanied new algorithms with approximation guarantees, culminating in the development of a PTAS (polynomial time approximation scheme) for solving FAS on tournaments.
In this paper we re-examine many of these existing algorithms and develop some new techniques for solving FAS. The algorithms are tested on both synthetic and Rank Aggregation-based datasets. We find that, in practice, local-search algorithms are very powerful, even though we prove that they do not have approximation guarantees. Our new algorithm is based on reversing arcs whose nodes have large indegree differences, eventually leading to a total ordering. Combining this with a powerful local-search technique yields an algorithm that beats existing techniques on a variety of data sets.
TL;DR: The computational results of the implementations of Seymour and Thomas algorithm show that the branchwidth of a planar graph can be computed in a practical time and memory space for some instances of size about one hundred thousand edges.
Abstract: We propose efficient implementations of Seymour and Thomas algorithm which, given a planar graph and an integer β, decides whether the graph has the branchwidth at least β. The computational results of our implementations show that the branchwidth of a planar graph can be computed in a practical time and memory space for some instances of size about one hundred thousand edges. Previous studies report that a straightforward implementation of the algorithm is memory consuming, which could be a bottleneck for solving instances with more than a few thousands edges. Our results suggest that with efficient implementations, the memory space required by the algorithm may not be a bottleneck in practice. Applying our implementations, an optimal branch decomposition of a planar graph of practical size can be computed in a reasonable time. Branch-decomposition based algorithms have been explored as an approach for solving many NP-hard problems on graphs. The results of this paper suggest that the approach could be practical.
TL;DR: In this article, the problem of hotlink assignment in web sites has been studied and a considerable number of approximation algorithms have been proposed and worst-case bounds for the quality of the computed solutions have been given.
Abstract: The concept of hotlink assignment aims at enhancing the structure of web sites such that the user's expected navigation effort is minimized. We concentrate on sites that are representable by trees and assume that each leaf carries a weight representing its popularity.
The problem of optimally adding at most one additional outgoing edge ("hotlink") to each inner node has been widely studied. A considerable number of approximation algorithms have been proposed and worst-case bounds for the quality of the computed solutions have been given. However, only little is known about the practical behaviour of most of these algorithms yet.
This paper contributes to close this gap by evaluating all recent strategies experimentally. Our experiments are based on trees extracted from real websites as well as on synthetic instances. The latter are generated by a new method that simulates the growth of a web site over time. We also propose a memory-efficient way to implement an optimal hotlink assignment algorithm, making it possible to compute optimal solutions for larger instances than before. Finally, we present a new approximation algorithm that is easy to implement and exhibits an excellent behaviour in practice.
TL;DR: This paper presents the first working implementation of partial persistence in the object-oriented language Java and uses aspect-oriented programming, a modularization technique which allows the existing code with the needed hooks for the persistence implementation to be instrumented.
Abstract: A partially persistent data structure is a data structure which preserves previous versions of itself when it is modified. General theoretical schemes are known (e.g. the fat node method) for making any data structure partially persistent. To our knowledge however no general implementation of these theoretical methods exists to date. This paper evaluates different methods to achieve this goal and presents the first working implementation of partial persistence in the object-oriented language Java. Our approach is transparent, i.e., it allows any existing data structures to become persistent without changing its implementation where all previous solutions require an extensive modification of the code by hand. This transparent property is important in view of the large number of algorithmic results that rely on persistence. Our implementation uses aspect-oriented programming, a modularization technique which allows us to instrument the existing code with the needed hooks for the persistence implementation. The implementation is then validated by running benchmarks to analyze both the cost of persistence and of the aspect oriented approach. We also illustrate its applicability by implementing a random binary search tree and making it persistent, and then using the resulting structure to implement a point location data structure in just a few lines.
TL;DR: Two machine learning algorithms, Weighted Majority of Warmuth and Littlestone, and Follow the Perturbed Leader of Kalai and Vempala, are developed based on which online learning algorithms can perform well in comparison to stochastic approaches, even when the stochastics approaches are given perfect information.
Abstract: The multi-period newsvendor problem describes the dilemma of a newspaper salesman---how many papers should he purchase each day to resell, when he doesn't know the demand? We develop approaches for this well known problem based on two machine learning algorithms: Weighted Majority of Warmuth and Littlestone, and Follow the Perturbed Leader of Kalai and Vempala. With some modified analysis, it isn't hard to show theoretical bounds for our modified versions of these algorithms. More importantly, we test the algorithms in a variety of simulated conditions, and compare the results to those given by traditional stochastic approaches which assume more information about the demands than is typically known. Our tests indicate that such online learning algorithms can perform well in comparison to stochastic approaches, even when the stochastic approaches are given perfect information.
TL;DR: A new design for the 3D triangulation package is described that permits to easily add functionality to compute triangulations in other spaces and benchmarks are shown to prove that the new design does not affect the efficiency.
Abstract: The Computational Geometry Algorithms Library gal currently provides packages to compute triangulations in R2 and R3. In this paper we describe a new design for the 3D triangulation package that permits to easily add functionality to compute triangulations in other spaces. These design changes have been implemented, and validated on the case of the periodic space T3. We give a detailed description of the realized changes together with their motivation. Finally, we show benchmarks to prove that the new design does not affect the efficiency.
TL;DR: This lecture deals with material flow problems from the viewpoint of network flow theory and reports on two industrial applications: (1) contolling material flow with automated guided vehicles in a container terminal and (2) timetabling in public transport.
Abstract: Material flow problems are complex logistic optimization problems We want to utilize the available logistic network in such a way that the load is minimized or the throughput is maximized This lecture deals with these optimization problems from the viewpoint of network flow theory and reports on two industrial applications: (1) contolling material flow with automated guided vehicles in a container terminal (cooperation with HHLA), and (2) timetabling in public transport (cooperation with Deutsche Bahn and Berlin Public Transport) The key ingredient for (1) is a very fast real-time algorithm which avoids collisions, deadlocks, and other conflicts already at route computation, while for (2) it is the use of integer programs based on special bases of the cycle space of the routing graph
TL;DR: A refined consensus clustering heuristic is developed for the occasions when the given clusterings may be too disparate, and their consensus may not be representative of any one of them, and it is shown that in practice the refined consensus clusterings can be much superior to the general consensus clustings.
Abstract: Consensus clustering is the problem of reconciling clustering information about the same data set coming from different sources or from different runs of the same algorithm. Cast as an optimization problem, consensus clustering is known as median partition, and has been shown to be NP-complete. A number of heuristics have been proposed as approximate solutions, some with performance guarantees. In practice, the problem is apparently easy to approximate, but guidance is necessary as to which heuristic to use depending on the number of elements and clusterings given. We have implemented a number of heuristics for the consensus clustering problem, and here we compare their performance, independent of data size, in terms of efficacy and efficiency, on both simulated and real data sets. We find that based on the underlying algorithms and their behavior in practice the heuristics can be categorized into two distinct groups, with ramification as to which one to use in a given situation, and that a hybrid solution is the best bet in general. We have also developed a refined consensus clustering heuristic for the occasions when the given clusterings may be too disparate, and their consensus may not be representative of any one of them, and we show that in practice the refined consensus clusterings can be much superior to the general consensus clustering.
TL;DR: This is an experimental study of algorithms for the shortest-path feasibility problem: Given a directed weighted graph, find a negative cycle or present a short proof that none exists.
Abstract: This is an experimental study of algorithms for the shortest path feasibility problem: Given a directed weighted graph, find a negative cycle or present a short proof that none exists. We study previously known and new algorithms. Our testbed is more extensive than those previously used, including both static and incremental problems, as well as worst-case instances. We show that, while no single algorithm dominates, a small subset (including a new algorithm) has very robust performance in practice. Our work advances state of the art in the area.
TL;DR: A new reconstruction algorithm is presented, one of whose main novelties is to throw away geometry information early on in the reconstruction process and to mainly operate combinatorially on a graph structure, less susceptible to robustness problems due to round-off errors and also benefits from not requiring expensive exact arithmetic by faster running times.
Abstract: Known algorithms for reconstructing a 2-manifold from a point sample in R3 are naturally based on decisions/predicates that take the geometry of the point sample into account. Facing the always present problem of round-off errors that easily compromise the exactness of those predicate decisions, an exact and robust implementation of these algorithms is far from being trivial and typically requires the employment of advanced datatypes for exact arithmetic as provided by libraries like CORE, LEDA or GMP. In this paper we present a new reconstruction algorithm, one of whose main novelties is to throw away geometry information early on in the reconstruction process and to mainly operate combinatorially on a graph structure. As such it is less susceptible to robustness problems due to round-off errors and also benefits from not requiring expensive exact arithmetic by faster running times. A more theoretical view on our algorithm including correctness proofs under suitable sampling conditions can be found in a companion paper [3].
TL;DR: This work presents a unidirectional speed-up technique, which competes with bidirectional approaches, and shows how to exploit the advantage of uniddirectional routing for fast exact queries in timetable information systems and for fast approximative queries in time-dependent scenarios.
Abstract: During the last years, impressive speed-up techniques for Dijkstra's algorithm have been developed. Unfortunately, the most advanced techniques use bidirectional search which makes it hard to use them in scenarios where a backward search is prohibited. Even worse, such scenarios are widely spread, e.g., timetable-information systems or time-dependent networks.
In this work, we present a unidirectional speed-up technique which competes with bidirectional approaches. Moreover, we show how to exploit the advantage of unidirectional routing for fast exact queries in timetable information systems and for fast approximative queries in time-dependent scenarios. By running experiments on several inputs other than road networks, we show that our approach is very robust to the input.