IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

doi:10.48550/arxiv.2403.03894

Journal Article10.48550/arxiv.2403.03894

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

Indraneil Paul, +3 more

- 06 Mar 2024

- arXiv.org

- Vol. abs/2403.03894

5

TL;DR: The prospect of leveraging readily available compiler intermediate representations (IR) - shared across programming languages - to improve the multilingual capabilities of Code-LMs and facilitate cross-lingual transfer is investigated.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2406.00515

A Survey on Large Language Models for Code Generation

J.-H.R. Jiang, +4 more

- 01 Jun 2024

TL;DR: This survey provides a comprehensive review of Large Language Models (LLMs) for code generation, introducing a taxonomy, historical overview, and empirical comparison of recent developments, highlighting advancements, challenges, and opportunities in this burgeoning field.

...read moreread less

75

Preprint•10.48550/arxiv.2404.16789

Continual Learning of Large Language Models: A Comprehensive Survey

Henry X. Shi, +6 more

- 25 Apr 2024

TL;DR: A survey on continual learning of large language models focusing on integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences.

...read moreread less

23

Journal Article•10.1145/3770084

A Survey on LLM-based Code Generation for Low-Resource and Domain-Specific Programming Languages

Jie Wu, +1 more

- 07 Oct 2025

- ACM Transactions on Software Engineering...

Abstract: Large Language Models (LLMs) have shown remarkable capabilities in code generation for popular programming languages. However, their performance in Low-Resource Programming Languages (LRPLs) and Domain-Specific Languages (DSLs) remains a critical challenge. This gap affects millions of developers - with Rust alone having 3.5 million users - who are currently unable to fully leverage LLM capabilities. LRPLs and DSLs face unique challenges, including severe data scarcity and, for DSLs, highly specialized syntax and semantics that are poorly represented in general-purpose datasets. Addressing these challenges is crucial as LRPLs and DSLs significantly enhance development efficiency in specialized domains and applications, including financial and scientific works. While several surveys on LLMs for software engineering and code exist, none comprehensively address the challenges and opportunities specific to LRPLs and DSLs. Our survey fills this gap by providing a systematic review of the current state, methodologies, and challenges in leveraging LLMs for code generation in LRPL and DSL. We filtered 111 papers from over 27,000 published studies from 2020 – 2024 to understand the capabilities and limitations of LLMs in these specialized domains. We also expanded our literature search to include 5 recent papers from 2024 – 2025. We report LLMs used, benchmarks, and metrics to evaluate code generation in LRPLs and DSLs, as well as strategies used to enhance LLM performance, and the collected datasets and curation methods in this context. We identified four main evaluation techniques used in the literature, along with several metrics to assess code generation in LRPL and DSL. We categorized the methods used for LLM improvement into six main groups and summarized the novel methods and architectures proposed by the researchers. We also classified different approaches used for data collection and preparation. While different techniques, metrics, and datasets are used, there is a lack of a standard approach and a benchmark dataset to evaluate code generation in several LRPLs and DSLs. We discuss several distinctions of the studied approaches with the ones used in high-resource programming languages (HRPLs), as well as several challenges unique to these languages, especially DSLs. The challenges stem from the scarcity of data, the unique requirements, and specialized domains, which often need expertise guidelines or domain-specific tools. Accordingly, we provide insights into different research opportunities for the studied aspects. This survey serves as a comprehensive resource for researchers and practitioners working at the intersection of LLMs, software engineering, and specialized programming languages, providing a foundation for future advancements in LRPL and DSL code generation. A GitHub repository was created to organize the papers of this survey at https://github.com/jie-jw-wu/Survey-CodeLLM4LowResource-DSL .

...read moreread less

4

Journal Article•10.2478/acss-2025-0013

Analysing Software Quality of AI-Translated Code: A Comparative Study of Large Language Models Using Static Analysis

Vikram Bhutani, +2 more

- 01 Jan 2025

- Applied Computer Systems

Abstract: Abstract Context: Source code translation enables cross-platform compatibility, code reusability, legacy system migration, and developer collaboration. Numerous state-of-the-art techniques have emerged to address demand for efficient and accurate translation methodologies. Objective: This study compares code translation capabilities of Large Language Models (LLMs), specifically DeepSeek R1 and ChatGPT 4.1, evaluating their proficiency in translating code between programming languages. We systematically assess model outputs through quantitative and qualitative measures, focusing on translation accuracy, execution efficiency, and coding standard conformity. By examining each model’s strengths and limitations, this work provides insights into their applicability for various translation scenarios and contributes to discourse on LLM potential in software engineering. Method: We evaluated translation quality from ChatGPT 4.1 and DeepSeek R1 using SonarQube Analyzer to identify strengths and weaknesses through comprehensive software metrics including translation accuracy, code quality, and clean code attributes. SonarQube’s framework enables objective quantification of maintainability, reliability, technical debt, and code smells which are critical factors in software quality measurement. The protocol involved randomly sampling 500 code instances from 1695 Java programming problems. Java samples were translated to Python by both models, then analysed quantitatively using SonarQube metrics to evaluate adherence to software engineering best practices. Results: This comparative analysis reveals capabilities and limitations of state-of-the-art LLM-based translation systems, providing developers, researchers, and practitioners actionable guidance for model selection. Identified gaps highlight future research directions in automated code translation. Result s demonstrate that DeepSeek R1 consistently generates superior software quality compared to ChatGPT 4.1 across Sonar-Qube metrics.

...read moreread less

Journal Article•10.48550/arxiv.2410.03351

Generating Equivalent Representations of Code By A Self-Reflection Approach

Jia Li, +4 more

- 04 Oct 2024

- arXiv.org

TL;DR: This paper proposes a self-reflection approach to generate Equivalent Representations (ERs) of code using Large Language Models (LLMs), enabling ERs in open and constrained settings, and presents eight findings on ER generation and its applications in software engineering tasks.

...read moreread less

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article•10.18653/V1/2020.ACL-MAIN.747

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, +9 more

- 01 Jul 2020

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

...read moreread less

6.9K

•Proceedings Article•10.5555/977395.977673

LLVM: a compilation framework for lifelong program analysis & transformation

Chris Lattner, +1 more

- 20 Mar 2004

TL;DR: The design of the LLVM representation and compiler framework is evaluated in three ways: the size and effectiveness of the representation, including the type information it provides; compiler performance for several interprocedural problems; and illustrative examples of the benefits LLVM provides for several challenging compiler problems.

...read moreread less

5.4K

•Proceedings Article•10.18653/V1/2021.NAACL-MAIN.41

mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer

Linting Xue, +7 more

- 01 Jun 2021

TL;DR: This paper proposed a multilingual variant of T5, mT5, which was pre-trained on a new Common Crawl-based dataset covering 101 languages and achieved state-of-the-art performance on many multilingual benchmarks.

...read moreread less

2.5K

Proceedings Article•10.1109/SEQUEN.1997.666900

On the resemblance and containment of documents

Andrei Z. Broder

- 11 Jun 1997

- Sequence

TL;DR: The basic idea is to reduce these issues to set intersection problems that can be easily evaluated by a process of random sampling that could be done independently for each document.

...read moreread less

2.3K

...

Expand