Algorithm Engineering and Experimentation

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•

Practical entropy-compressed rank/select dictionary

[...]

Daisuke Okanohara¹, Kunihiko Sadakane²•Institutions (2)

University of Tokyo¹, Kyushu University²

6 Jan 2007

TL;DR: Four novel Rank/Select dictionaries are proposed: esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0 (S) ≤ 1 is the zero-th order empirical entropy of S) in practice.

...read moreread less

Abstract: Rank/Select dictionaries are data structures for an ordered set S ⊂ {0,1, . . ., n − 1} to compute rank(x, S) (the number of elements in S that are no greater than x), and select(i, S) (the i-th smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc.. In these data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose four novel Rank/Select dictionaries: esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) ≤ 1 is the zero-th order empirical entropy of S) in practice. Furthermore, their query times are superior to those of existing structures. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations, both in terms of size and query time.

...read moreread less

298 citations

Proceedings Article•

StreamKM++: a clustering algorithm for data streams

[...]

Marcel R. Ackermann¹, Christiane Lammersen², Marcus Märtens¹, Christoph Raupach¹, Christian Sohler², Kamil Swierkot¹ - Show less +2 more•Institutions (2)

University of Paderborn¹, Technical University of Dortmund²

16 Jan 2010

TL;DR: A new k-means clustering algorithm for data streams of points from a Euclidean space that provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large.

...read moreread less

Abstract: We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++ seeding procedure to obtain small coresets from the data stream. This construction is rather easy to implement and, unlike other coreset constructions, its running time has only a low dependency on the dimensionality of the data. Second, we propose a new data structure which we call a coreset tree. The use of these coreset trees significantly speeds up the time necessary for the non-uniform sampling during our coreset construction. We compare our algorithm experimentally with two well-known streaming implementations (BIRCH [16] and StreamLS [4, 9]). In terms of quality (sum of squared errors), our algorithm is comparable with StreamLS and significantly better than BIRCH (up to a factor of 2). In terms of running time, our algorithm is slower than BIRCH. Comparing the running time with StreamLS, it turns out that our algorithm scales much better with increasing number of centers. We conclude that, if the first priority is the quality of the clustering, then our algorithm provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large. We also give a theoretical justification of our approach by proving that our sample set is a small coreset in low dimensional spaces.

...read moreread less

285 citations

Proceedings Article•

Better approximation of betweenness centrality

[...]

Robert Geisberger¹, Peter Sanders¹, Dominik Schultes¹•Institutions (1)

Karlsruhe Institute of Technology¹

19 Jan 2008

TL;DR: A framework for unbiased approximation of betweenness is proposed that generalizes a previous approach by Brandes and yields significantly better approximation than before for many real world inputs and good approximations for the betweenness of unimportant nodes.

...read moreread less

Abstract: Estimating the importance or centrality of the nodes in large networks has recently attracted increased interest. Betweenness is one of the most important centrality indices, which basically counts the number of shortest paths going through a node. Betweenness has been used in diverse applications, e.g., social network analysis or route planning. Since exact computation is prohibitive for large networks, approximation algorithms are important. In this paper, we propose a framework for unbiased approximation of betweenness that generalizes a previous approach by Brandes. Our best new schemes yield significantly better approximation than before for many real world inputs. In particular, we also get good approximations for the betweenness of unimportant nodes.

...read moreread less

279 citations

Book Chapter•10.1007/3-540-45643-0_4•

Using Multi-level Graphs for Timetable Information in Railway Systems

[...]

Frank Schulz¹, Frank Schulz², Dorothea Wagner², Christos D. Zaroliagis¹•Institutions (2)

University of Patras¹, University of Konstanz²

4 Jan 2002

TL;DR: This paper performs a detailed analysis and experimental evaluation of shortest path computations based on multi-level graph decomposition for one specific application scenario from the field of timetable information in public transport.

...read moreread less

Abstract: In many fields of application, shortest path finding problems in very large graphs arise. Scenarios where large numbers of on-line queries for shortest paths have to be processed in real-time appear for example in traffic information systems. In such systems, the techniques considered to speed up the shortest path computation are usually based on precomputed information. One approach proposed often in this context is a space reduction, where precomputed shortest paths are replaced by single edges with weight equal to the length of the corresponding shortest path. In this paper, we give a first systematic experimental study of such a space reduction approach. We introduce the concept of multi-level graph decomposition. For one specific application scenario from the field of timetable information in public transport, we perform a detailed analysis and experimental evaluation of shortest path computations based on multi-level graph decomposition.

...read moreread less

139 citations

Book Chapter•10.1007/3-540-45643-0_13•

Acceleration of K-Means and Related Clustering Algorithms

[...]

Steven J. Phillips¹•Institutions (1)

AT&T Labs¹

4 Jan 2002

TL;DR: Two simple modification of K-means and related algorithms for clustering, that improve the running time without changing the output are described, and the two resulting algorithms are called Compare-mean and Sort- means.

...read moreread less

Abstract: This paper describes two simple modification of K-means and related algorithms for clustering, that improve the running time without changing the output. The two resulting algorithms are called Compare-means and Sort-means. The time for an iteration of K-means is reduced from O(ndk), where n is the number of data points, k the number of clusters and d the dimension, to O(nd? + k2d + k2 log k) for Sort-means. Here ? ? k is the average over all points p of the number of means that are no more than twice as far as p is from the mean p was assigned to in the previous iteration. Compare-means performs a similar number of distance calculations as Sort-means, and is faster when the number of means is very large. Both modifications are extremely simple, and could easily be added to existing clustering implementations.We investigate the empirical performance of the algorithms on three datasets drawn from practical applications. As a primary test case, we use the Isodata variant of K-means on a sample of 2.3 million 6-dimensional points drawn from a Landsat-7 satellite image. For this dataset, ? quickly drops to less than log2 k, and the running time decreases accordingly. For example, a run with k = 100 drops from an hour and a half to sixteen minutes for Compare-means and six and a half minutes for Sortmeans. Further experiments show similar improvements on datasets derived from a forestry application and from the analysis of BGP updates in an IP network.

...read moreread less

121 citations

...

Expand

Year	Papers
2021	7
2020	16
2019	17
2018	20
2017	24
2016	16

Conference Tools

Papers published on a yearly basis

Papers

Practical entropy-compressed rank/select dictionary

StreamKM++: a clustering algorithm for data streams

Better approximation of betweenness centrality

Using Multi-level Graphs for Timetable Information in Railway Systems

Acceleration of K-Means and Related Clustering Algorithms

Performance Metrics