Near-Data-Processing Architectures Performance Estimation and Ranking using Machine Learning Predictors

doi:10.1109/DSD53832.2021.00033

Proceedings Article10.1109/DSD53832.2021.00033

Near-Data-Processing Architectures Performance Estimation and Ranking using Machine Learning Predictors

Veronia Iskandar, +2 more

- 01 Sep 2021

- pp 158-165

5

TL;DR: In this paper, a machine learning framework is proposed to predict the performance of near-data processing (NDP) applications on 3D-stacked DRAM systems, based on an input set of application characteristics.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/j.vlsi.2023.06.002

AI/ML Algorithms and Applications in VLSI Design and Technology

Deepthi Amuru, +6 more

- 21 Feb 2022

- Integration

TL;DR: In this article , the authors discuss the future scope of AI/ML applications to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations.

...read moreread less

26

Journal Article•10.1016/j.micpro.2022.104707

NDP-RANK: Prediction and ranking of NDP systems performance using machine learning

Veronia Iskandar, +2 more

- 01 Oct 2022

- Microprocessors and Microsystems

TL;DR: In this article , a machine learning framework is proposed to decide which near-data processing (NDP) system is suitable for an application based on an input set of application and micro-architecture characteristics.

...read moreread less

2

Journal Article•10.1109/fpl60245.2023.00065

Performance Estimation and Prototyping of Reconfigurable Near-Memory Computing Systems

Veronia Iskandar, +2 more

- 04 Sep 2023

TL;DR: This work addresses the research questions of how to leverage the full bandwidth of 3D-stacked high-bandwidth memory and how to facilitate the adoption of the near-memory computing paradigm.

...read moreread less

1

Proceedings Article•10.1109/dsd60849.2023.00085

Auto-DOK: Compiler-Assisted Automatic Detection of Offload Kernels for FPGA-HBM Architectures

Veronia Iskandar, +2 more

- 06 Sep 2023

TL;DR: Auto-DOK is a compiler-assisted tool for identifying offload kernels for FPGA-HBM architectures. It analyzes application code based on hardware design goals and automatically identifies kernels suitable for offloading.

...read moreread less

Journal Article•10.1145/3771723

An End-to-End Framework for Compiling Dense and Sparse Matrix-Vector Multiplications for FPGA-HBM Acceleration

Veronia Iskandar, +2 more

- 13 Oct 2025

- ACM Transactions on Architecture and Cod...

Abstract: The bandwidth improvement provided by high-bandwidth memory (HBM), and the capability of FPGAs to customize the processing and memory hierarchy, results in a considerable performance increase for memory-intensive workloads such as graph processing, sorting, machine learning, and database analytics. Modern systems integrating 3D-stacked DRAM memory can be leveraged to realize the Near-Memory Computing (NMC) paradigm by offloading some computations to accelerators placed near the HBM. Matrix-vector multiplication (MVM) kernels, which are memory-bound, can significantly benefit from being executed on FPGA-HBM platforms. MVM kernels can be broadly categorized into two types: dense (General Matrix-Vector Multiplication, GEMV) and sparse (Sparse Matrix-Vector Multiplication, SpMV). Recent literature has predominantly focused on optimizing SpMV for FPGA-HBM, leaving a unified solution relatively unexplored. In this work, we introduce an end-to-end framework for compiling MVM kernels for FPGA-HBM Acceleration. It consists of a software and a hardware components. The software component introduces the MATIO compiler, a novel toolflow for detecting MVM and matrix multiplication (MM) kernels in C or C++ code, and replacing MVM kernels with a call to our FPGA accelerator. MATIO is capable of detecting 90% of MVM and MM kernels in real-world benchmarks collected from Github. Additionally, it is faster than state-of-the-art detection methods by at least 45x. On the hardware side, we introduce VecMADS, a novel FPGA architecture designed to efficiently handle both GEMV and SpMV operations. Our architecture leverages the high memory bandwidth of HBM to overcome memory bottlenecks, providing a comprehensive solution for accelerating matrix-vector multiplication on FPGAs. Evaluation results show that VecMADS delivers 1.5x higher throughput and 4.8x higher energy efficiency compared to cuSPARSE library on GPU. Considering dense benchmarks, VecMADS achieves 1.26x higher throughput than the hipBLAS library running on GPU.

...read moreread less

References

•Proceedings Article•10.5555/977395.977673

LLVM: a compilation framework for lifelong program analysis & transformation

Chris Lattner, +1 more

- 20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

5.4K

Proceedings Article•10.1145/2749469.2750386

A scalable processing-in-memory accelerator for parallel graph processing

Junwhan Ahn, +4 more

- 13 Jun 2015

TL;DR: This work argues that the conventional concept of processing-in-memory (PIM) can be a viable solution to achieve memory-capacity-proportional performance and designs a programmable PIM accelerator for large-scale graph processing called Tesseract.

...read moreread less

937

•Journal Article•10.1109/LCA.2015.2414456

Ramulator: A Fast and Extensible DRAM Simulator

Yoongu Kim, +2 more

- 01 Jan 2016

- IEEE Computer Architecture Letters

TL;DR: This paper presents Ramulator, a fast and cycle-accurate DRAM simulator that is built from the ground up for extensibility, and is able to provide out-of-the-box support for a wide array of DRAM standards.

...read moreread less

775