Understanding query performance in Accumulo

doi:10.1109/HPEC.2013.6670330

Proceedings Article10.1109/HPEC.2013.6670330

Understanding query performance in Accumulo

Scott M. Sawyer, +3 more

- 21 Nov 2013

- pp 1-6

26

TL;DR: An Apache Accumulo-based big data system designed for a network situational awareness application is studied and its storage schema and data retrieval requirements are analyzed, and the correspondingAccumulo performance bottlenecks are characterized.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/HPEC.2014.7040945

Achieving 100,000,000 database inserts per second using Accumulo and D4M

Jeremy Kepner, +12 more

- 19 Jun 2014

- arXiv: Databases

TL;DR: The Apache Accumulo database as discussed by the authors is an open source relaxed consistency database that is widely used for government applications and is designed to deliver high performance on unstructured data such as graphs of network data.

...read moreread less

52

•Proceedings Article•10.1109/HPEC.2015.7322448

Graphulo implementation of server-side sparse matrix multiply in the Accumulo database

Dylan Hutchison, +3 more

- 12 Nov 2015

TL;DR: A server-side implementation of GraphBLAS sparse matrix multiplication that leverages Accumulo's native, high-performance iterators and offers its work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulus.

...read moreread less

46

Book Chapter•10.4018/978-1-4666-9834-5.CH006

Modeling and Indexing Spatiotemporal Trajectory Data in Non-Relational Databases

Berkay Aydin, +2 more

- 01 Jan 2016

TL;DR: In this chapter, the important aspects of non-relational (NoSQL) databases for storing large-scale spatiotemporal trajectory data are investigated and two data storage schemata are proposed for storing trajectories.

...read moreread less

18

•Proceedings Article•10.1109/HPEC.2015.7322476

Lustre, hadoop, accumulo

Jeremy Kepner, +13 more

- 01 Sep 2015

TL;DR: In this article, the authors compare Lustre, Hadoop, and Accumulo databases on a hypothetical common cluster and show that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads.

...read moreread less

15

Journal Article•10.1016/J.DATAK.2019.101732

CloudDBGuard: A framework for encrypted data storage in NoSQL wide column stores

Lena Wiese, +2 more

- 01 Mar 2020

TL;DR: This article comprehensively present details of the framework CloudDBGuard that allows using property-preserving encryption in unmodified wide column stores, and hides the complexity of the encryption and decryption process and allows various adjustments on specific use cases in order to achieve a maximum of security, functionality and performance.

...read moreread less

14

...

Expand

References

Journal Article•10.1145/1365815.1365816

Bigtable: A Distributed Storage System for Structured Data

Fay W. Chang, +8 more

- 01 Jun 2008

- ACM Transactions on Computer Systems

TL;DR: The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.

...read moreread less

3.5K

Journal Article•10.14778/1920841.1920908

Hadoop++: making a yellow elephant run like a cheetah (without it even noticing)

Jens Dittrich, +5 more

- 01 Sep 2010

TL;DR: This paper proposes a new type of system named Hadoop++: it boosts task performance without changing the Hadooper framework at all (Hadoop does not even 'notice it'), and shows the superiority of Hadoo++ over both Hadoops and HadoOPDB for tasks related to indexing and join processing.

...read moreread less

747

Proceedings Article•10.1145/2038916.2038925

YCSB++: benchmarking and performance debugging advanced features in scalable table stores

Swapnil Patil, +8 more

- 26 Oct 2011

TL;DR: YCSB++ is described, a set of extensions to the Yahoo! Cloud Serving Benchmark that includes multi-tester coordination for increased load and eventual consistency measurement, multi-phase workloads to quantify the consequences of work deferment and the benefits of anticipatory configuration optimization, and abstract APIs for explicit incorporation of advanced features in benchmark tests.

...read moreread less

205

•Proceedings Article•10.1109/ICASSP.2012.6289129

Dynamic distributed dimensional data model (D4M) database and computation system

Jeremy Kepner, +15 more

- 25 Mar 2012

TL;DR: D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language “SQL” databases) and it is possible to create composable analytics with significantly less effort than using traditional approaches.

...read moreread less

168

•Proceedings Article•10.1109/HPEC.2012.6408678

Driving big data with big compute

Chansup Byun, +13 more

- 01 Sep 2012

TL;DR: The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds, including LLGrid MapReduce, which allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster.

...read moreread less

60