Legacy code

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.18653/V1/2021.NAACL-MAIN.211•

Unified Pre-training for Program Understanding and Generation

[...]

Wasi Uddin Ahmad¹, Saikat Chakraborty², Baishakhi Ray², Kai-Wei Chang¹•Institutions (2)

University of California, Los Angeles¹, Columbia University²

1 Jun 2021

TL;DR: Analysis reveals that PLBART learns program syntax, style, logical flow, and style that are crucial to program semantics and thus excels even with limited annotations, and outperforms or rivals state-of-the-art models.

...read moreread less

Abstract: Code summarization and generation empower conversion between programming language (PL) and natural language (NL), while code translation avails the migration of legacy code from one PL to another. This paper introduces PLBART, a sequence-to-sequence model capable of performing a broad spectrum of program and language understanding and generation tasks. PLBART is pre-trained on an extensive collection of Java and Python functions and associated NL text via denoising autoencoding. Experiments on code summarization in the English language, code generation, and code translation in seven programming languages show that PLBART outperforms or rivals state-of-the-art models. Moreover, experiments on discriminative tasks, e.g., program repair, clone detection, and vulnerable code detection, demonstrate PLBART’s effectiveness in program understanding. Furthermore, analysis reveals that PLBART learns program syntax, style (e.g., identifier naming convention), logical flow (e.g., “if“ block inside an “else“ block is equivalent to “else if“ block) that are crucial to program semantics and thus excels even with limited annotations.

...read moreread less

770 citations

Proceedings Article•

The end of an architectural era: (it's time for a complete rewrite)

[...]

Michael Stonebraker¹, Samuel Madden¹, Daniel J. Abadi¹, Stavros Harizopoulos¹, Nabil Hachem, Pat Helland² - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

23 Sep 2007

TL;DR: In this paper, the authors show that the current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact excel at nothing, and they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines.

...read moreread less

Abstract: In previous papers [SC05, SBC+07], some of us predicted the end of "one size fits all" as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1--2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets. Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to a popular RDBMS on the standard transactional benchmark, TPC-C. We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow's requirements, not continue to push code lines and architectures designed for yesterday's needs.

...read moreread less

596 citations

Journal Article•10.1145/602382.602403•

The verifying compiler: A grand challenge for computing research

[...]

Tony Hoare¹•Institutions (1)

Microsoft¹

01 Jan 2003-Journal of the ACM

TL;DR: This contribution proposes a set of criteria that distinguish a grand challenge in science or engineering from the many other kinds of short-term or long-term research problems that engage the interest of scientists and engineers.

...read moreread less

Abstract: This contribution proposes a set of criteria that distinguish a grand challenge in science or engineering from the many other kinds of short-term or long-term research problems that engage the interest of scientists and engineers. As an example drawn from Computer Science, it revives an old challenge: the construction and application of a verifying compiler that guarantees correctness of a program before running it.

...read moreread less

288 citations

Proceedings Article•10.1109/ICSM.1997.624243•

Identifying modules via concept analysis

[...]

Michael Siff¹, Thomas Reps•Institutions (1)

Sarah Lawrence College¹

1 Oct 1997

TL;DR: An algorithmic framework to construct a lattice of concepts from a program, where each concept represents a potential module in legacy code is presented.

...read moreread less

Abstract: Describes a general technique for identifying modules in legacy code. The method is based on concept analysis-a branch of lattice theory that can be used to identify similarities among a set of objects based on their attributes. We discuss how concept analysis can identify potential modules using both lpositiver and lnegativer information. We present an algorithmic framework to construct a lattice of concepts from a program, where each concept represents a potential module

...read moreread less

230 citations

Proceedings Article•10.1145/1572272.1572287•

Detecting code clones in binary executables

[...]

Andreas Sæbjørnsen¹, Jeremiah Willcock², Thomas Panas³, Daniel J. Quinlan³, Zhendong Su¹ - Show less +1 more•Institutions (3)

University of California, Davis¹, Indiana University², Lawrence Livermore National Laboratory³

19 Jul 2009

TL;DR: The first practical clone detection algorithm for binary executables is described, which extends an existing tree similarity framework based on clustering of characteristic vectors of labeled trees with novel techniques to normalize assembly instructions and to accurately and compactly model their structural information.

...read moreread less

Abstract: Large software projects contain significant code duplication, mainly due to copying and pasting code. Many techniques have been developed to identify duplicated code to enable applications such as refactoring, detecting bugs, and protecting intellectual property. Because source code is often unavailable, especially for third-party software, finding duplicated code in binaries becomes particularly important. However, existing techniques operate primarily on source code, and no effective tool exists for binaries.In this paper, we describe the first practical clone detection algorithm for binary executables. Our algorithm extends an existing tree similarity framework based on clustering of characteristic vectors of labeled trees with novel techniques to normalize assembly instructions and to accurately and compactly model their structural information. We have implemented our technique and evaluated it on Windows XP system binaries totaling over 50 million assembly instructions. Results show that it is both scalable and precise: it analyzed Windows XP system binaries in a few hours and produced few false positives. We believe our technique is a practical, enabling technology for many applications dealing with binary code.

...read moreread less

210 citations

...

Expand

Year	Papers
2025	4
2024	2
2023	1
2022	6
2021	25
2020	27

Topic Tools

Papers published on a yearly basis

Papers

Unified Pre-training for Program Understanding and Generation

The end of an architectural era: (it's time for a complete rewrite)

The verifying compiler: A grand challenge for computing research

Identifying modules via concept analysis

Detecting code clones in binary executables

Related Topics (5)

Performance Metrics