Descend: A Safe GPU Systems Programming Language

doi:10.1145/3656411

Journal Article10.1145/3656411

Descend: A Safe GPU Systems Programming Language

Bastian Köpcke, +2 more

- 20 Jun 2024

- Proceedings of the ACM on programming la...

- Vol. 8, Iss: PLDI, pp 841-864

2

TL;DR: Descend is a safe GPU systems programming language that enforces safe CPU and GPU memory management in the type system, preventing data races and deadlocks.

Abstract: Graphics Processing Units (GPU) offer tremendous computational power by following a throughput oriented paradigm where many thousand computational units operate in parallel. Programming such massively parallel hardware is challenging. Programmers must correctly and efficiently coordinate thousands of threads and their accesses to various shared memory spaces. Existing mainstream GPU programming languages, such as CUDA and OpenCL, are based on C/C++ inheriting their fundamentally unsafe ways to access memory via raw pointers. This facilitates easy to make, but hard to detect bugs, such as data races and deadlocks . In this paper, we present Descend : a safe GPU programming language. In contrast to prior safe high-level GPU programming approaches, Descend is an imperative GPU systems programming language in the spirit of Rust, enforcing safe CPU and GPU memory management in the type system by tracking Ownership and Lifetimes . Descend introduces a new holistic GPU programming model where computations are hierarchically scheduled over the GPU’s execution resources : grid, blocks, warps, and threads. Descend ’s extended Borrow checking ensures that execution resources safely access memory regions without data races. For this, we introduced views describing safe parallel access patterns of memory regions, as well as atomic variables. For memory accesses that can’t be checked by our type system, users can annotate limited code sections as unsafe . We discuss the memory safety guarantees offered by Descend and evaluate our implementation using multiple benchmarks, demonstrating that Descend is capable of expressing real-world GPU programs showing competitive performance compared to manually written CUDA programs lacking Descend ’s safety guarantees.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/qrs65678.2025.00038

Shard: Securing GPU Kernels with Lightweight Formal Methods

Jiacheng Zhao, +1 more

- 16 Jul 2025

Journal Article•10.1145/3763075

SafeRace: Assessing and Addressing WebGPU Memory Safety in the Presence of Data Races

Reese Levine, +4 more

- 09 Oct 2025

- Proceedings of the ACM on programming la...

Abstract: In untrusted execution environments such as web browsers, code from remote sources is regularly executed. To harden these environments against attacks, constituent programming languages and their implementations must uphold certain safety properties, such as memory safety. These properties must be maintained across the entire compilation stack, which may include intermediate languages that do not provide the same safety guarantees. Any case where properties are not preserved could lead to a serious security vulnerability. In this work, we identify a specification vulnerability in the WebGPU Shading Language (WGSL) where code with data races can be compiled to intermediate representations in which an optimizing compiler could legitimately remove memory safety guardrails. To address this, we present SafeRace, a collection of threat assessments and specification proposals across the WGSL execution stack. While our threat assessment showed that this vulnerability does not appear to be exploitable on current systems, it creates a ”ticking time bomb”, especially as compilers in this area are rapidly evolving. Given this, we introduce the SafeRace Memory Safety Guarantee (SMSG), two components that preserve memory safety in the WGSL execution stack even in the presence of data races. The first component specifies that program slices contributing to memory indexing must be race free and is implemented via a compiler pass for WGSL programs. The second component is a requirement on intermediate representations that limits the effects of data races so that they cannot impact race-free program slices. While the first component is not guaranteed to apply to all possible WGSL programs due to limitations on how some data types can be accessed, we show that existing language constructs are sufficient to implement this component with minimal performance overhead on many existing important WebGPU applications. We test the second component by performing a fuzzing campaign of 81 hours across 21 compilation stacks; our results show violations on only one (likely buggy) machine, thus providing evidence that lower-level GPU frameworks could relatively straightforwardly support this constraint. Finally, our assessments discovered GPU memory isolation vulnerabilities in Apple and AMD GPUs, as well as a security-critical miscompilation of WGSL in a pre-release version of Firefox.

...read moreread less

References

•Journal Article•10.1145/1365490.1365500

Scalable Parallel Programming with CUDA: Is CUDA the parallel programming model that application developers have been waiting for?

John R. Nickolls, +3 more

- 01 Mar 2008

- ACM Queue

TL;DR: In this article, the authors present a framework to develop mainstream application software that transparently scales its parallelism to leverage the increasing number of processor cores, much as 3D graphics applications transparently scale their parallelism on manycore GPUs with widely varying numbers of cores.

...read moreread less

1.4K

•Journal Article•10.1016/J.PARCO.2011.09.001

PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

Andreas Klöckner, +5 more

- 01 Mar 2012

TL;DR: This article proposes the combination of a dynamic, high-level scripting language with the massive performance of a GPU as a compelling two-tiered computing platform, potentially offering significant performance and productivity advantages over conventional single-tier, static systems.

...read moreread less

676

Proceedings Article•10.1145/2384616.2384625

GPUVerify: a verifier for GPU kernels

Adam Betts, +4 more

- 19 Oct 2012

TL;DR: An efficient encoding for data race detection and a method for automatically inferring loop invariants required for verification are described and implemented as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code.

...read moreread less

173

Proceedings Article•10.1145/3062341.3062354

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

Troels Henriksen, +4 more

- 14 Jun 2017

TL;DR: This paper presents the design and implementation of three key features of Futhark that seek a suitable middle ground with imperative approaches and presents a flattening transformation aimed at enhancing the degree of parallelism.

...read moreread less

156

•Proceedings Article•10.1109/HPEC.2014.7040988

An investigation of Unified Memory Access performance in CUDA

Raphael Landaverde, +3 more

- 01 Sep 2014

TL;DR: It is found that beyond on-demand data transfers to the CPU, the GPU is also able to request subsets of data it requires on demand, which allows UMA to outperform full data transfer methods for certain parallel applications and small data sizes.

...read moreread less

125

...

Expand