Compiler Auto-Vectorization with Imitation Learning

Open AccessProceedings Article

Compiler Auto-Vectorization with Imitation Learning

- 01 Jan 2019

Vol. 32, pp 14625-14635

39

TL;DR: This work explores whether it is feasible to imitate optimal decisions made by their ILP solution by fitting a graph neural network policy and shows that the learnt policy produces a vectorization scheme which is better than industry standard compiler heuristics both in terms of static measures and runtime performance.

Abstract: Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit fine-grained data level parallelism. To exploit this parallelism, compilers employ auto-vectorization techniques to automatically convert scalar code into vector code. Larsen & Amarasinghe (2000) first introduced superword level parallelism (SLP) based vectorization, which is one form of vectorization popularly used by compilers. Current compilers employ hand-crafted heuristics and typically only follow one SLP vectorization strategy which can be suboptimal. Recently, Mendis & Amarasinghe (2018) formulated the instruction packing problem of SLP vectorization by leveraging an integer linear programming (ILP) solver, achieving superior runtime performance. In this work, we explore whether it is feasible to imitate optimal decisions made by their ILP solution by fitting a graph neural network policy. We show that the learnt policy produces a vectorization scheme which is better than industry standard compiler heuristics both in terms of static measures and runtime performance. More specifically, the learnt agent produces a vectorization scheme which has a 22.6% higher average reduction in cost compared to LLVM compiler when measured using its own cost model and achieves a geometric mean runtime speedup of 1.015× on the NAS benchmark suite when compared to LLVM’s SLP vectorizer.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1145/3494523

A Survey of Machine Learning for Computer Architecture and Systems

03 Feb 2022

- ACM Computing Surveys

TL;DR: In this article , the authors present a comprehensive review of the work that applies ML for computer architecture and system design, and summarize the common problems in computer architecture/system design that can be solved by ML techniques and the typical ML techniques employed to resolve each of them.

...read moreread less

121

Journal Article•10.48550/arXiv.2302.07867

Learning Performance-Improving Code Edits

Aman Madaan, +7 more

- 15 Feb 2023

- arXiv.org

TL;DR: In this paper , a large-scale dataset of Performance-Improving Edits, PIE, is used to evaluate and improve the capacity of large language models (LLMs) to suggest functionally correct, performance improving code edits.

...read moreread less

50

•Journal Article•10.1145/3418463

IR2Vec: LLVM IR based Scalable Program Embeddings

S. VenkataKeerthy, +5 more

- 13 Sep 2019

- arXiv: Programming Languages

TL;DR: IR2Vec as discussed by the authors is a distributed encoding infrastructure that combines representation learning methods with flow information to capture the syntax as well as the semantics of the input programs and achieves state-of-the-art performance on heterogeneous device mapping and thread coarsening.

...read moreread less

47

•Proceedings Article

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

Christopher C. Cummins, +5 more

- 18 Jul 2021

19

Proceedings Article•10.1109/PACT52795.2021.00014

HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems

Myeonggyun Han, +1 more

- 01 Sep 2021

TL;DR: In this article, a reinforcement learning-augmented system for efficient real-time inference on heterogeneous embedded systems is proposed, which achieves high inference efficiency in multiple metrics (i.e., energy and energy delay product) with a strong deadline guarantee.

...read moreread less

11

...

Expand

References

•Proceedings Article•10.3115/V1/W14-4012

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

Kyunghyun Cho, +5 more

- 03 Sep 2014

TL;DR: In this paper, a gated recursive convolutional neural network (GRNN) was proposed to learn a grammatical structure of a sentence automatically, which performed well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.

...read moreread less

6.9K

•Proceedings Article•10.5555/977395.977673

LLVM: a compilation framework for lifelong program analysis & transformation

Chris Lattner, +1 more

- 20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

5.4K

•Posted Content

Graph Neural Networks: A Review of Methods and Applications

Jie Zhou, +8 more

- 20 Dec 2018

- arXiv: Learning

TL;DR: A detailed review over existing graph neural network models is provided, systematically categorize the applications, and four open problems for future research are proposed.

...read moreread less

4.3K

•Proceedings Article

Gated Graph Sequence Neural Networks.

Yujia Li, +3 more

- 01 Apr 2016

TL;DR: This work studies feature learning techniques for graph-structured inputs and achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be matched to abstract data structures.

...read moreread less

3.5K

•Journal Article

The NAS Parallel Benchmarks

David H. Bailey

- 14 Jul 2010

- Lawrence Berkeley National Laboratory

TL;DR: The original NAS Parallel Benchmarks consisted of eight individual bench- mark problems, each of which focused on some aspect of scientiﬁc computing, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world computing applications.

...read moreread less

2.1K

...

Expand

Compiler Auto-Vectorization with Imitation Learning

Chat with Paper

AI Agents for this Paper

Citations

A Survey of Machine Learning for Computer Architecture and Systems

Learning Performance-Improving Code Edits

IR2Vec: LLVM IR based Scalable Program Embeddings

ProGraML: A Graph-based Program Representation for Data Flow Analysis and Compiler Optimizations

HERTI: A Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems

References

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

LLVM: a compilation framework for lifelong program analysis & transformation

Graph Neural Networks: A Review of Methods and Applications

Gated Graph Sequence Neural Networks.

The NAS Parallel Benchmarks

Related Papers (5)

goSLP: globally optimized superword level parallelism framework

A Compiler Approach for Exploiting Partial SIMD Parallelism

Vectorization past dependent branches through speculation

VeGen: a vectorizer generator for SIMD and beyond

Compilation techniques for multimedia processors