Open AccessPosted Content
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core Systems
Karan Aggarwal,Uday Bondhugula +1 more
TL;DR: In this article, target-independent optimizations were proposed to optimize sparse matrix-vector multiplication (SVMV) operations on both CPU and GPU. But the performance of the SpMV operation often depends on exploiting regularity patterns in the matrix.
read more
Abstract: Sparse matrix-vector multiplication (SpMV) operations are commonly used in various scientific applications. The performance of the SpMV operation often depends on exploiting regularity patterns in the matrix. Various representations have been proposed to minimize the memory bandwidth bottleneck arising from the irregular memory access pattern involved. Among recent representation techniques, tensor decomposition is a popular one used for very large but sparse matrices. Post sparse-tensor decomposition, the new representation involves indirect accesses, making it challenging to optimize for multi-cores and GPUs.
Computational neuroscience algorithms often involve sparse datasets while still performing long-running computations on them. The LiFE application is a popular neuroscience algorithm used for pruning brain connectivity graphs. The datasets employed herein involve the Sparse Tucker Decomposition (STD), a widely used tensor decomposition method. Using this decomposition leads to irregular array references, making it very difficult to optimize for both CPUs and GPUs. Recent codes of the LiFE algorithm show that its SpMV operations are the key bottleneck for performance and scaling. In this work, we first propose target-independent optimizations to optimize these SpMV operations, followed by target-dependent optimizations for CPU and GPU systems. The target-independent techniques include: (1) standard compiler optimizations, (2) data restructuring methods, and (3) methods to partition computations among threads. Then we present the optimizations for CPUs and GPUs to exploit platform-specific speed. Our highly optimized CPU code obtain a speedup of 27.12x over the original sequential CPU code running on 16-core Intel Xeon (Skylake-based) system, and our optimized GPU code achieves a speedup of 5.2x over a reference optimized GPU code version on NVIDIA's GeForce RTX 2080 Ti GPU.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Optimization Techniques for GPU Programming
TL;DR: In this article , a survey discusses various optimization techniques found in 450 articles published in the last 14 years and analyzes the optimizations from different perspectives which shows that the various optimizations are highly interrelated, explaining the need for techniques such as auto-tuning.
54
Optimizing the linear fascicle evaluation algorithm for many-core systems
Karan Aggarwal,Uday Bondhugula +1 more
- 26 Jun 2019
TL;DR: Data restructuring techniques to minimize the effects of irregular accesses are proposed, various optimizations to optimally map threads at the granularity of warps, thread blocks and grid, and methods to partition the computation among thread blocks to obtain fine-grained parallelism and data reuse are proposed.
7
References
Tensor Decompositions and Applications
Tamara G. Kolda,Brett W. Bader +1 more
TL;DR: This survey provides an overview of higher-order tensor decompositions, their applications, and available software.
Some mathematical notes on three-mode factor analysis
TL;DR: The model for three-mode factor analysis is discussed in terms of newer applications of mathematical processes including a type of matrix process termed the Kronecker product and the definition of combination variables.
4.5K
Three-dimensional tracking of axonal projections in the brain by magnetic resonance imaging.
TL;DR: It is shown that neuronal pathways in the rat brain can be probed in situ using high‐resolution three‐dimensional diffusion magnetic resonance imaging and a newly designed tracking approach.
3.7K
The Human Connectome: A Structural Description of the Human Brain
TL;DR: A research strategy to achieve the connection matrix of the human brain (the human “connectome”) is proposed, and its potential impact is discussed.
In vivo fiber tractography using DT-MRI data
TL;DR: Fiber tract trajectories in coherently organized brain white matter pathways were computed from in vivo diffusion tensor magnetic resonance imaging (DT‐MRI) data, and the method holds promise for elucidating architectural features in other fibrous tissues and ordered media.