Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

doi:10.48550/arxiv.2308.03873

Journal Article10.48550/arxiv.2308.03873

Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

David N. Palacio, +4 more

- 07 Aug 2023

- arXiv.org

- Vol. abs/2308.03873

5

TL;DR: ASTxplainer is introduced, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions.

Abstract: Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural networks pre-trained on massive datasets of both natural and programming languages. These models are rapidly being employed in commercial AI-based developer tools, such as GitHub CoPilot. However, measuring and explaining their effectiveness on programming tasks is a challenging proposition, given their size and complexity. The methods for evaluating and explaining LLMs for code are inextricably linked. That is, in order to explain a model's predictions, they must be reliably mapped to fine-grained, understandable concepts. Once this mapping is achieved, new methods for detailed model evaluations are possible. However, most current explainability techniques and evaluation benchmarks focus on model robustness or individual task performance, as opposed to interpreting model predictions. To this end, this paper introduces ASTxplainer, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes, by extracting and aggregating normalized model logits within AST structures. To demonstrate the practical benefit of ASTxplainer, we illustrate the insights that our framework can provide by performing an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects. Additionally, we perform a user study examining the usefulness of an ASTxplainer-derived visualization of model predictions aimed at enabling model users to explain predictions. The results of these studies illustrate the potential for ASTxplainer to provide insights into LLM effectiveness, and aid end-users in understanding predictions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2311.10372

A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

Zibin Zheng, +6 more

- 17 Nov 2023

- arXiv.org

TL;DR: This research provides insights for practitioners to better understand key improvement directions for Code LLMs and comprehensively maintained the performance of LLMs across multiple mainstream benchmarks to identify the best-performing LLMs for each software engineering task.

...read moreread less

29

10.1186/s42400-025-00361-w

When LLMs Meet Cybersecurity: A Systematic Literature Review

Jie Zhang, +5 more

TL;DR: This systematic literature review of 300+ works on LLMs in cybersecurity covers 25 models and 10 scenarios, addressing construction, applications, challenges, and future research to enhance cybersecurity practices and provide a valuable resource for the field.

...read moreread less

15

Journal Article•10.48550/arxiv.2402.05980

Do Large Code Models Understand Programming Concepts? A Black-box Approach

Ashish Hooda, +5 more

- 08 Feb 2024

- arXiv.org

TL;DR: This work proposes Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts, and suggests that current models lack understanding of concepts such as data flow and control flow.

...read moreread less

4

Journal Article•10.48550/arxiv.2407.12830

Knowledge-based Consistency Testing of Large Language Models

Sai Sathiesh Rajan, +2 more

- 03 Jul 2024

- arXiv.org

TL;DR: This study proposes KONTEST, an automated testing framework that exposes inconsistencies and knowledge gaps in Large Language Models (LLMs) via semantically-equivalent queries and test oracles, revealing 16.5% knowledge gaps and inducing 19.2% errors in four state-of-the-art LLMs.

...read moreread less

1

Preprint•10.48550/arxiv.2405.03644

When LLMs Meet Cybersecurity: A Systematic Literature Review

Jie Zhang, +5 more

- 06 May 2024

TL;DR: A systematic literature review exploring the potential of LLMs in cybersecurity, encompassing over 180 works and addressing key research questions.

...read moreread less

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Monograph•10.1017/CBO9780511803161

Causality: models, reasoning, and inference

Judea Pearl

- 14 Sep 2009

- Tijdschrift Voor Filosofie

TL;DR: The art and science of cause and effect have been studied in the social sciences for a long time as mentioned in this paper, see, e.g., the theory of inferred causation, causal diagrams and the identification of causal effects.

...read moreread less

14.9K

•Book

Introduction to Automata Theory, Languages, and Computation

John E. Hopcroft, +3 more

- 01 Jan 1979

TL;DR: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity, appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.

...read moreread less

14.5K

•Journal Article•10.1162/153244303322533223

A neural probabilistic language model

Yoshua Bengio, +3 more

- 01 Mar 2003

- Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

8K

...

Expand