Associative array

TL;DR: A novel Compaction-Aware Zone Allocation algorithm (CAZA) is proposed that allows the newly created SSTables to be deleted together after merging in the future and significantly reduces the WA overhead compared to LIZA.

...read moreread less

Abstract: Unlike traditional block-based SSDs, Zoned Namespace (ZNS) SSDs expose storage through the zoned block interface, completely eliminating the need for in-device garbage collection (GC) and relinquishing this responsibility to applications. As a result, application-aware data placement decisions give the opportunity for applications on the host to perform efficient GC. Meanwhile, RocksDB for ZNS SSD places data with similar invalidation times (lifetimes) in the same zone through ZenFS (a user-level file system) using the Lifetime-based Zone Allocation algorithm (LIZA), and minimizes the GC overhead of valid data copy when reclaiming a zone. However, LIZA, which allocates zones by predicting the lifetime of each SSTable according to the level of the hierarchical structure of the LSM-tree, is very inefficient in minimizing the write amplification (WA) problem due to inaccurate predictions of SSTable lifetimes. Instead, based on our observation that the deletion time of SSTables in the LSM-tree is solely determined by the compaction process, we propose a novel Compaction-Aware Zone Allocation algorithm (CAZA) that allows the newly created SSTables to be deleted together after merging in the future. CAZA is implemented in RocksDB's ZenFS and our extensive evaluations show that CAZA significantly reduces the WA overhead compared to LIZA.

...read moreread less

34 citations

Book•

LUCAS associative array processor

[...]

Christer Fernstrom, Ivan Kruzela, Bertil Svensson

1 Jan 1986

29 citations

10.1109/AFIPS.1972.135•

A production implementation of an associative array processor - STARAN

[...]

Jack A. Rudolph¹•Institutions (1)

Goodyear Aerospace¹

30 Dec 1899

TL;DR: The associative or content-addressed memory has been an attractive concept to computer designers ever since Slade and McMahon's 1957 paper described a "catalog" memory as mentioned in this paper.

...read moreread less

Abstract: The associative or content-addressed memory has been an attractive concept to computer designers ever since Slade and McMahon's 1957 paper described a "catalog" memory. Associative memories offered relief from the continuing problem presented by the typical coordinate-addressed memory which requires that an "address" be obtained or calculated before data stored at that address may be retrieved. The associative memory could acquire in a single memory access any data from memory without pre-knowledge of its location. Ordered files and sorting operations could be eliminated. Unfortunately, early associative memories were expensive, hence none found their way as the "main frame" memory into any commercial computer design.

...read moreread less

24 citations

Proceedings Article•10.1109/HPEC.2019.8916508•

Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

[...]

Jeremy Kepner¹, Michael Houle¹, Michael Jones¹, Anne Klein¹, Peter Michaleas¹, Julie Mullen¹, Andrew Prout¹, Antonio Rosa¹, Charles Yee¹, Albert Reuther¹, Vijay Gadepally¹, Lauren Milechin¹, Siddharth Samsi¹, William Arcand¹, David Bestor¹, William Bergeron¹, Chansup Byun¹, Matthew Hubbell¹ - Show less +14 more•Institutions (1)

Massachusetts Institute of Technology¹

6 Jul 2019

TL;DR: This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array.

...read moreread less

Abstract: The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for analyzing many types of network data. D4M relies on associative arrays which combine properties of spreadsheets, databases, matrices, graphs, and networks, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of D4M associative arrays put enormous pressure on the memory hierarchy. This work describes the design and performance optimization of an implementation of hierarchical associative arrays that reduces memory pressure and dramatically increases the update rate into an associative array. The parameters of hierarchical associative arrays rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical arrays achieve over 40,000 updates per second in a single instance. Scaling to 34,000 instances of hierarchical D4M associative arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 1,900,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

...read moreread less

22 citations

...

Expand

Year	Papers
2025	3
2024	2
2023	12
2022	22
2021	8
2020	5

Topic Tools

Papers published on a yearly basis

Papers

The Role of Associative Array Processers in Data Base Machine Architecture

Compaction-aware zone allocation for LSM based key-value store on ZNS SSDs

LUCAS associative array processor

A production implementation of an associative array processor - STARAN

Streaming 1.9 Billion Hypersparse Network Updates per Second with D4M

Related Topics (5)

Performance Metrics