Journal Article10.48550/arXiv.2305.08360
Improving ChatGPT Prompt for Code Generation
TL;DR: In this paper , the authors designed prompts by leveraging the chain-of-thought strategy with multi-step optimizations to improve the performance of ChatGPT for text-to-code and code-tocode generation.
read more
Abstract: Automated code generation can be a powerful technique for software development, significantly reducing developers' efforts and time required to create new code by generating it automatically based on requirements. Recently, OpenAI's language model ChatGPT has emerged as a powerful tool for generating human-like responses to a wide range of textual inputs (i.e., prompts), including those related to code generation. However, the effectiveness of ChatGPT for code generation is not well understood, and the generation performance could be heavily influenced by the choice of prompt. To answer these questions, we conducted experiments using the CodeXGlue dataset to evaluate ChatGPT's capabilities for two code generation tasks, including text-to-code and code-to-code generation. We designed prompts by leveraging the chain-of-thought strategy with multi-step optimizations. Our results showed that by carefully designing prompts to guide ChatGPT, the generation performance can be improved substantially. We also analyzed the factors that influenced the prompt design and provided insights that could guide future research.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

TABLE III TESTING THREE BASELINES ON T2C AND C2C GENERATION TASKS. 
TABLE IV TESTING THE BEST BASELINES IN RQ1 WITH (OR WITHOUT) THE CONCISENESS REQUEST (C) ON T2C AND C2C GENERATION TASKS. 
TABLE VI TESTING THE BEST BASELINES IN RQ3 ON T2C AND C2C GENERATION TASKS IN FIVE ROUNDS (RQ1-R5), WHERE ”MIN”, ”MAX”, ”AVG”, ”STD” STAND FOR THE MINIMUM, MAXIMUM, AVERAGE AND STANDARD DEVIATION OF THE GENERATION ACCURACY. 
TABLE V TESTING THE BEST BASELINE IN RQ2 WITH (OR WITHOUT) THE SESSION SETTING (S) ON T2C AND C2C GENERATION TASKS. 
TABLE IX QUALITY ANALYSIS OF THE CODE GENERATED BY CHATGPT-BEST AND THE CORRESPONDING GROUND-TRUTH ON T2C AND C2C GENERATION TASKS. 
TABLE II TESTING DIFFERENT PROMPT COMBINATIONS IN TABLE I ON 100 SAMPLES RANDOMLY SELECTED FROM TRAINING DATA OF EACH GENERATION TASK. NOTE THAT P5(API) INDICATES THAT WE ONLY USED THE API PART OF THE PROMPT P5.
Citations
A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair
Quanjun Zhang,Tongke Zhang,Juan Zhai,Chunrong Fang,Bo-Chen Yu,Weisong Sun,Zhenyu Chen +6 more
TL;DR: This paper seeks to review the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives, and calls for more research on the reevaluation of the achievements obtained by existing black-box LLMs across various SE tasks, not limited to ChatG PT on APR.
Large Language Models are Complex Table Parsers
Bowen Zhao,Changkai Ji,Yuejie Zhang,Wen He,Yingwen Wang,Qing Wang,Rui Feng,Xiaobo Zhang +7 more
- 13 Dec 2023
TL;DR: This paper enhances the prompt template with an explanatory description of the meaning of each tuple and the logical reasoning process of the task, which effectively improves the hierarchical structure awareness capability of GPT-3.5 to better parse the complex tables.
ClarifyGPT: A Framework for Enhancing LLM-Based Code Generation via Requirements Clarification
Fangwen Mu,Lin Shi,Song Wang,Zhuohao Yu,Binquan Zhang,ChenXue Wang,Shuning Liu,Qing Wang +7 more
- 12 Jul 2024
TL;DR: A novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions, and can effectively facilitate the practical application of LLMs in real-world development environments.
11
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP
Le Chen,Arijit Bhattacharjee,Nesreen K. Ahmed,Niranjan Hasabnis,Gal Oren,Vy Vo,Ali Jannesari +6 more
TL;DR: This paper introduces OMPGPT, a novel model meticulously designed to harness the inherent strengths of language models for OpenMP pragma generation, and adopts and adapt prompt engineering techniques from the NLP domain to create chain-of-OMP, an innovative strategy designed to enhance OMPGPT's effectiveness.
9
Using LLMs to Customize the UI of Webpages
Amanda Li,Jason Wu,Jeffrey P Bigham +2 more
- 29 Oct 2023
TL;DR: It is observed that specific prompts referring to color or targeted components can succeed, vague requests and any complex website tend to perform poorly.
9
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
•Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
•Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
- 28 May 2020
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.