Journal Article10.48550/arxiv.2308.03873
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures
David N. Palacio,Alejandro Velasco,Daniel Rodríguez-Cárdenas,Kevin Moran,Denys Poshyvanyk +4 more
5
TL;DR: ASTxplainer is introduced, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions.
read more
Abstract: Large Language Models (LLMs) for code are a family of high-parameter, transformer-based neural networks pre-trained on massive datasets of both natural and programming languages. These models are rapidly being employed in commercial AI-based developer tools, such as GitHub CoPilot. However, measuring and explaining their effectiveness on programming tasks is a challenging proposition, given their size and complexity. The methods for evaluating and explaining LLMs for code are inextricably linked. That is, in order to explain a model's predictions, they must be reliably mapped to fine-grained, understandable concepts. Once this mapping is achieved, new methods for detailed model evaluations are possible. However, most current explainability techniques and evaluation benchmarks focus on model robustness or individual task performance, as opposed to interpreting model predictions. To this end, this paper introduces ASTxplainer, an explainability method specific to LLMs for code that enables both new methods for LLM evaluation and visualizations of LLM predictions that aid end-users in understanding model predictions. At its core, ASTxplainer provides an automated method for aligning token predictions with AST nodes, by extracting and aggregating normalized model logits within AST structures. To demonstrate the practical benefit of ASTxplainer, we illustrate the insights that our framework can provide by performing an empirical evaluation on 12 popular LLMs for code using a curated dataset of the most popular GitHub projects. Additionally, we perform a user study examining the usefulness of an ASTxplainer-derived visualization of model predictions aimed at enabling model users to explain predictions. The results of these studies illustrate the potential for ASTxplainer to provide insights into LLM effectiveness, and aid end-users in understanding predictions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends
Zibin Zheng,Kaiwen Ning,Yanlin Wang,Jingwen Zhang,Dewu Zheng,Mingxi Ye,Jiachi Chen +6 more
TL;DR: This research provides insights for practitioners to better understand key improvement directions for Code LLMs and comprehensively maintained the performance of LLMs across multiple mainstream benchmarks to identify the best-performing LLMs for each software engineering task.
29
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang,Haoyu Bu,Hui Wen,Yu Chen,Lun Li,Hongsong Zhu +5 more
TL;DR: This systematic literature review of 300+ works on LLMs in cybersecurity covers 25 models and 10 scenarios, addressing construction, applications, challenges, and future research to enhance cybersecurity practices and provide a valuable resource for the field.
15
Do Large Code Models Understand Programming Concepts? A Black-box Approach
Ashish Hooda,Mihai Christodorescu,Miltos Allamanis,Aaron Wilson,Kassem Fawaz,Somesh Jha +5 more
TL;DR: This work proposes Counterfactual Analysis for Programming Concept Predicates (CACP) as a counterfactual testing framework to evaluate whether Large Code Models understand programming concepts, and suggests that current models lack understanding of concepts such as data flow and control flow.
Knowledge-based Consistency Testing of Large Language Models
Sai Sathiesh Rajan,Ezekiel O. Soremekun,Sudipta Chattopadhyay +2 more
TL;DR: This study proposes KONTEST, an automated testing framework that exposes inconsistencies and knowledge gaps in Large Language Models (LLMs) via semantically-equivalent queries and test oracles, revealing 16.5% knowledge gaps and inducing 19.2% errors in four state-of-the-art LLMs.
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang,H. Bu,Hong Wen,Yu Chen,Lun Li,Hongsong Zhu +5 more
- 06 May 2024
TL;DR: A systematic literature review exploring the potential of LLMs in cybersecurity, encompassing over 180 works and addressing key research questions.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Causality: models, reasoning, and inference
TL;DR: The art and science of cause and effect have been studied in the social sciences for a long time as mentioned in this paper, see, e.g., the theory of inferred causation, causal diagrams and the identification of causal effects.
14.9K
•Book
Introduction to Automata Theory, Languages, and Computation
John E. Hopcroft,Rajeev Motwani,Rotwani,Jeffrey D. Ullman +3 more
- 01 Jan 1979
TL;DR: This book is a rigorous exposition of formal languages and models of computation, with an introduction to computational complexity, appropriate for upper-level computer science undergraduates who are comfortable with mathematical arguments.
14.5K
A neural probabilistic language model
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.