Journal Article10.1145/3656411
Descend: A Safe GPU Systems Programming Language
Bastian Köpcke,Sergei Gorlatch,Michel Steuwer +2 more
2
TL;DR: Descend is a safe GPU systems programming language that enforces safe CPU and GPU memory management in the type system, preventing data races and deadlocks.
read more
Abstract: Graphics Processing Units (GPU) offer tremendous computational power by following a throughput oriented paradigm where many thousand computational units operate in parallel. Programming such massively parallel hardware is challenging. Programmers must correctly and efficiently coordinate thousands of threads and their accesses to various shared memory spaces. Existing mainstream GPU programming languages, such as CUDA and OpenCL, are based on C/C++ inheriting their fundamentally unsafe ways to access memory via raw pointers. This facilitates easy to make, but hard to detect bugs, such as data races and deadlocks . In this paper, we present Descend : a safe GPU programming language. In contrast to prior safe high-level GPU programming approaches, Descend is an imperative GPU systems programming language in the spirit of Rust, enforcing safe CPU and GPU memory management in the type system by tracking Ownership and Lifetimes . Descend introduces a new holistic GPU programming model where computations are hierarchically scheduled over the GPU’s execution resources : grid, blocks, warps, and threads. Descend ’s extended Borrow checking ensures that execution resources safely access memory regions without data races. For this, we introduced views describing safe parallel access patterns of memory regions, as well as atomic variables. For memory accesses that can’t be checked by our type system, users can annotate limited code sections as unsafe . We discuss the memory safety guarantees offered by Descend and evaluate our implementation using multiple benchmarks, demonstrating that Descend is capable of expressing real-world GPU programs showing competitive performance compared to manually written CUDA programs lacking Descend ’s safety guarantees.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Shard: Securing GPU Kernels with Lightweight Formal Methods
Jiacheng Zhao,Bao-Jian Hua +1 more
- 16 Jul 2025
SafeRace: Assessing and Addressing WebGPU Memory Safety in the Presence of Data Races
Reese Levine,Ashley Lee,Neha Abbas,Kyle Little,Tyler Sorensen +4 more
Abstract: In untrusted execution environments such as web browsers, code from remote sources is regularly executed. To harden these environments against attacks, constituent programming languages and their implementations must uphold certain safety properties, such as memory safety. These properties must be maintained across the entire compilation stack, which may include intermediate languages that do not provide the same safety guarantees. Any case where properties are not preserved could lead to a serious security vulnerability. In this work, we identify a specification vulnerability in the WebGPU Shading Language (WGSL) where code with data races can be compiled to intermediate representations in which an optimizing compiler could legitimately remove memory safety guardrails. To address this, we present SafeRace, a collection of threat assessments and specification proposals across the WGSL execution stack. While our threat assessment showed that this vulnerability does not appear to be exploitable on current systems, it creates a ”ticking time bomb”, especially as compilers in this area are rapidly evolving. Given this, we introduce the SafeRace Memory Safety Guarantee (SMSG), two components that preserve memory safety in the WGSL execution stack even in the presence of data races. The first component specifies that program slices contributing to memory indexing must be race free and is implemented via a compiler pass for WGSL programs. The second component is a requirement on intermediate representations that limits the effects of data races so that they cannot impact race-free program slices. While the first component is not guaranteed to apply to all possible WGSL programs due to limitations on how some data types can be accessed, we show that existing language constructs are sufficient to implement this component with minimal performance overhead on many existing important WebGPU applications. We test the second component by performing a fuzzing campaign of 81 hours across 21 compilation stacks; our results show violations on only one (likely buggy) machine, thus providing evidence that lower-level GPU frameworks could relatively straightforwardly support this constraint. Finally, our assessments discovered GPU memory isolation vulnerabilities in Apple and AMD GPUs, as well as a security-critical miscompilation of WGSL in a pre-release version of Firefox.
References
Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?
TL;DR: In this article, the authors present a framework to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism on manycore GPUs with widely varying numbers of cores.
1.4K
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation
Andreas Klöckner,Nicolas Pinto,Yunsup Lee,Bryan Catanzaro,Paul Ivanov,Ahmed Fasih +5 more
- 01 Mar 2012
TL;DR: This article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems.
676
GPUVerify: a verifier for GPU kernels
Adam Betts,Nathan Chong,Alastair F. Donaldson,Shaz Qadeer,Paul Thomson +4 more
- 19 Oct 2012
TL;DR: An efficient encoding for data race detection and a method for automatically inferring loop invariants required for verification are described and implemented as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code.
Futhark: purely functional GPU-programming with nested parallelism and in-place array updates
Troels Henriksen,Niels Gustav Westphal Serup,Martin Elsman,Fritz Henglein,Cosmin E. Oancea +4 more
- 14 Jun 2017
TL;DR: This paper presents the design and implementation of three key features of Futhark that seek a suitable middle ground with imperative approaches and presents a flattening transformation aimed at enhancing the degree of parallelism.
156
An investigation of Unified Memory Access performance in CUDA
Raphael Landaverde,Tiansheng Zhang,Ayse K. Coskun,Martin C. Herbordt +3 more
- 01 Sep 2014
TL;DR: It is found that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand, which allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes.
125