Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

doi:10.48550/arxiv.2404.02575

Journal Article10.48550/arxiv.2404.02575

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Hyungjoo Chae, +10 more

- 03 Apr 2024

- arXiv.org

- Vol. abs/2404.02575

8

TL;DR: Think-and-Execute framework improves algorithmic reasoning in LLMs by decomposing the reasoning process into task-level logic and instance-specific code execution.

Abstract: Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use programming languages (e.g., Python) to express the necessary logic for solving a given instance/question (e.g., Program-of-Thought) as inspired by their strict and precise syntaxes. However, it is non-trivial to write an executable code that expresses the correct logic on the fly within a single inference call. Also, the code generated specifically for an instance cannot be reused for others, even if they are from the same task and might require identical logic to solve. This paper presents Think-and-Execute, a novel framework that decomposes the reasoning process of language models into two steps. (1) In Think, we discover a task-level logic that is shared across all instances for solving a given task and then express the logic with pseudocode; (2) In Execute, we further tailor the generated pseudocode to each instance and simulate the execution of the code. With extensive experiments on seven algorithmic reasoning tasks, we demonstrate the effectiveness of Think-and-Execute. Our approach better improves LMs' reasoning compared to several strong baselines performing instance-specific reasoning (e.g., CoT and PoT), suggesting the helpfulness of discovering task-level logic. Also, we show that compared to natural language, pseudocode can better guide the reasoning of LMs, even though they are trained to follow natural language instructions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2409.12183

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Zayne Sprague, +9 more

- 18 Sep 2024

- arXiv.org

TL;DR: Chain-of-thought (CoT) via prompting significantly improves performance on math and symbolic reasoning tasks, but offers minimal benefits on other tasks, suggesting selective application and a need to explore new paradigms beyond prompt-based CoT.

...read moreread less

23

Preprint•10.48550/arxiv.2406.16386

Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach

Yuting Wan, +6 more

- 24 Jun 2024

TL;DR: Automatically generating UI code from screenshots is a time-consuming process. DCGen is a divide-and-conquer-based approach that effectively mitigates issues in generating UI code by focusing on smaller visual segments.

...read moreread less

5

Journal Article•10.48550/arXiv.2301.06178

Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News

Xingmeng Zhao, +3 more

- 15 Jan 2023

- arXiv.org

TL;DR: In this paper , the authors explore the perceived perception of cyclists within news headlines and compare and contrast the perceptions of cyclists with motorcyclist-related headlines to ground the findings with another related activity for both male and female-related posts.

...read moreread less

1

Journal Article•10.48550/arxiv.2406.02470

Meta-Designing Quantum Experiments with Language Models

Sören Arlt, +5 more

- 04 Jun 2024

TL;DR: A language model trained on synthetic data generates meta-solutions for designing quantum experiments, producing interpretable code for entire classes of quantum systems and uncovering general design rules for infinitely large classes of quantum states.

...read moreread less

Preprint•10.48550/arxiv.2405.16337

Learning to Reason via Program Generation, Emulation, and Search

Nathaniel Weir, +4 more

- 25 May 2024

TL;DR: CoGEX extends program synthesis capabilities of LMs to tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding by generating pseudo-programs and searching over them.

...read moreread less

References

Proceedings Article

Chain of Thought Prompting Elicits Reasoning in Large Language Models

Jason Loh Seong Wei, +7 more

- 28 Jan 2022

TL;DR: Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.

...read moreread less

4.8K

Proceedings Article

Large Language Models are Zero-Shot Reasoners

Takeshi Kojima, +4 more

- 24 May 2022

TL;DR: Experimental results demonstrate that the Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics, symbolic reasoning, and other logical reasoning tasks, without any hand-crafted few-shot examples.

...read moreread less

2.3K

•Posted Content

HellaSwag: Can a Machine Really Finish Your Sentence?.

Rowan Zellers, +4 more

- 19 May 2019

- arXiv: Computation and Language

TL;DR: HellaSwag as discussed by the authors ) is a commonsense NLP dataset where a series of discriminators iteratively select an adversarial set of machine-generated wrong answers, and the key insight is to scale up the length and complexity of the dataset examples towards a critical 'Goldilocks' zone where generated text is ridiculous to humans, yet often misclassified by state-of-the-art models.

...read moreread less

1.2K

Journal Article•10.48550/arXiv.2211.12588

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks

Wenhu Chen, +3 more

- 22 Nov 2022

- arXiv.org

TL;DR: Wenhuchen et al. as discussed by the authors proposed the Program of Thoughts (PoT) model to disentangle computation from reasoning, which uses language models (mainly Codex) to express the reasoning process as a program.

...read moreread less

426

Journal Article•10.48550/arXiv.2211.10435

PAL: Program-aided Language Models

Luyu Gao, +7 more

- 18 Nov 2022

- arXiv.org

TL;DR: Program-Aided Language Models (PAL) as discussed by the authors uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter.

...read moreread less

291