Proceedings Article10.1145/1854273.1854348
An integer programming framework for optimizing shared memory use on GPUs
Wenjing Ma,Gagan Agrawal +1 more
- 11 Sep 2010
- pp 553-554
34
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Unified on-chip memory allocation for SIMT architecture
Ari B. Hayes,Eddy Z. Zhang +1 more
- 10 Jun 2014
TL;DR: Overall, it is discovered that it is possible to automatically determine an on-chip memory resource allocation that maximizes concurrency while ensuring good single-thread performance at compile-time.
Type-safe runtime code generation: accelerate to LLVM
Trevor L. McDonell,Manuel M. T. Chakravarty,Vinod Grover,Ryan R. Newton +3 more
- 30 Aug 2015
TL;DR: This paper discusses the compilation pipeline of Accelerate, a high-performance array language targeting both multicore CPUs and GPUs, where it is able to preserve types from the source language down to a low-level register language in SSA form, and creates a new type-safe interface to the industrial-strength LLVM compiler infrastructure.
19
Benchmarking the GPU memory at the warp level
Minquan Fang,Jianbin Fang,Weimin Zhang,Haifang Zhou,Jianxing Liao,Yuangang Wang +5 more
- 01 Jan 2018
TL;DR: This work discloses the characteristics of GPU memories at the warp-level, and leads to optimization guidelines, and summarizes the optimization guidelines for different types of memories, and builds an optimization framework on GPU memories.
19
Combined model predictive control and scheduling with dominant time constant compensation
TL;DR: The proposed methods are time-scaling of the linear dynamics based on throughput rates and grade-based objectives for product scheduling based on a mathematical program with complementarity constraints to both control and optimize a product grade schedule.
17
Optimizing Data Placement on GPU Memory: A Portable Approach
TL;DR: This article provides a comprehensive description of this method, and presents several extensions that significantly improve the scalability of PORPLE, which include a novel algorithm design for efficiently searching for the best data placements, the use of active profiling for reducing the online-profiling overhead, and a systematic examination of a path-based performance model.
15
References
LLVM: a compilation framework for lifelong program analysis & transformation
Chris Lattner,Vikram Adve +1 more
- 20 Mar 2004
TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.