Conference
Algorithm Engineering and Experimentation
About: Algorithm Engineering and Experimentation is an academic conference. The conference publishes majorly in the area(s): Computer science & Data structure. Over the lifetime, 319 publications have been published by the conference receiving 8901 citations.
Papers published on a yearly basis
Papers
Proceedings Article•
6 Jan 2007TL;DR: Four novel Rank/Select dictionaries are proposed: esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0 (S) ≤ 1 is the zero-th order empirical entropy of S) in practice.
Abstract: Rank/Select dictionaries are data structures for an ordered set S ⊂ {0,1, . . ., n − 1} to compute rank(x, S) (the number of elements in S that are no greater than x), and select(i, S) (the i-th smallest element in S), which are the fundamental components of succinct data structures of strings, trees, graphs, etc.. In these data structures, however, only asymptotic behavior has been considered and their performance for real data is not satisfactory. In this paper, we propose four novel Rank/Select dictionaries: esp, recrank, vcode and sdarray, each of which is small if the number of elements in S is small, and indeed close to nH0(S) (H0(S) ≤ 1 is the zero-th order empirical entropy of S) in practice. Furthermore, their query times are superior to those of existing structures. Experimental results reveal the characteristics of our data structures and also show that these data structures are superior to existing implementations, both in terms of size and query time.
298 citations
Proceedings Article•
16 Jan 2010TL;DR: A new k -means clustering algorithm for data streams of points from a Euclidean space that provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large.
Abstract: We develop a new k-means clustering algorithm for data streams, which we call StreamKM++. Our algorithm computes a small weighted sample of the data stream and solves the problem on the sample using the k-means++ algorithm [1]. To compute the small sample, we propose two new techniques. First, we use a non-uniform sampling approach similar to the k-means++ seeding procedure to obtain small coresets from the data stream. This construction is rather easy to implement and, unlike other coreset constructions, its running time has only a low dependency on the dimensionality of the data. Second, we propose a new data structure which we call a coreset tree. The use of these coreset trees significantly speeds up the time necessary for the non-uniform sampling during our coreset construction.
We compare our algorithm experimentally with two well-known streaming implementations (BIRCH [16] and StreamLS [4, 9]). In terms of quality (sum of squared errors), our algorithm is comparable with StreamLS and significantly better than BIRCH (up to a factor of 2). In terms of running time, our algorithm is slower than BIRCH. Comparing the running time with StreamLS, it turns out that our algorithm scales much better with increasing number of centers. We conclude that, if the first priority is the quality of the clustering, then our algorithm provides a good alternative to BIRCH and StreamLS, in particular, if the number of cluster centers is large.
We also give a theoretical justification of our approach by proving that our sample set is a small coreset in low dimensional spaces.
285 citations
Proceedings Article•
19 Jan 2008TL;DR: A framework for unbiased approximation of betweenness is proposed that generalizes a previous approach by Brandes and yields significantly better approximation than before for many real world inputs and good approximations for the betweenness of unimportant nodes.
Abstract: Estimating the importance or centrality of the nodes in large networks has recently attracted increased interest. Betweenness is one of the most important centrality indices, which basically counts the number of shortest paths going through a node. Betweenness has been used in diverse applications, e.g., social network analysis or route planning. Since exact computation is prohibitive for large networks, approximation algorithms are important. In this paper, we propose a framework for unbiased approximation of betweenness that generalizes a previous approach by Brandes. Our best new schemes yield significantly better approximation than before for many real world inputs. In particular, we also get good approximations for the betweenness of unimportant nodes.
279 citations
4 Jan 2002
TL;DR: This paper performs a detailed analysis and experimental evaluation of shortest path computations based on multi-level graph decomposition for one specific application scenario from the field of timetable information in public transport.
Abstract: In many fields of application, shortest path finding problems in very large graphs arise. Scenarios where large numbers of on-line queries for shortest paths have to be processed in real-time appear for example in traffic information systems. In such systems, the techniques considered to speed up the shortest path computation are usually based on precomputed information. One approach proposed often in this context is a space reduction, where precomputed shortest paths are replaced by single edges with weight equal to the length of the corresponding shortest path. In this paper, we give a first systematic experimental study of such a space reduction approach. We introduce the concept of multi-level graph decomposition. For one specific application scenario from the field of timetable information in public transport, we perform a detailed analysis and experimental evaluation of shortest path computations based on multi-level graph decomposition.
139 citations
4 Jan 2002
TL;DR: Two simple modification of K-means and related algorithms for clustering, that improve the running time without changing the output are described, and the two resulting algorithms are called Compare-mean and Sort- means.
Abstract: This paper describes two simple modification of K-means and related algorithms for clustering, that improve the running time without changing the output. The two resulting algorithms are called Compare-means and Sort-means. The time for an iteration of K-means is reduced from O(ndk), where n is the number of data points, k the number of clusters and d the dimension, to O(nd? + k2d + k2 log k) for Sort-means. Here ? ? k is the average over all points p of the number of means that are no more than twice as far as p is from the mean p was assigned to in the previous iteration. Compare-means performs a similar number of distance calculations as Sort-means, and is faster when the number of means is very large. Both modifications are extremely simple, and could easily be added to existing clustering implementations.We investigate the empirical performance of the algorithms on three datasets drawn from practical applications. As a primary test case, we use the Isodata variant of K-means on a sample of 2.3 million 6-dimensional points drawn from a Landsat-7 satellite image. For this dataset, ? quickly drops to less than log2 k, and the running time decreases accordingly. For example, a run with k = 100 drops from an hour and a half to sixteen minutes for Compare-means and six and a half minutes for Sortmeans. Further experiments show similar improvements on datasets derived from a forestry application and from the analysis of BGP updates in an IP network.
121 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2021 | 7 |
| 2020 | 16 |
| 2019 | 17 |
| 2018 | 20 |
| 2017 | 24 |
| 2016 | 16 |