Peer Review
Communicative Agents for Software Development
Chen Qian,Cheng Yang,Weize Chen,Yusheng Su,Zhiyuan Liu,Maosong Sun +5 more
- 16 Jul 2023
154
TL;DR: ChatDev as mentioned in this paper is a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting.
read more
Abstract: Software engineering is a domain characterized by intricate decision-making processes, often relying on nuanced intuition and consultation. Recent advancements in deep learning have started to revolutionize software engineering practices through elaborate designs implemented at various stages of software development. In this paper, we present an innovative paradigm that leverages large language models (LLMs) throughout the entire software development process, streamlining and unifying key processes through natural language communication, thereby eliminating the need for specialized models at each phase. At the core of this paradigm lies ChatDev, a virtual chat-powered software development company that mirrors the established waterfall model, meticulously dividing the development process into four distinct chronological stages: designing, coding, testing, and documenting. Each stage engages a team of agents, such as programmers, code reviewers, and test engineers, fostering collaborative dialogue and facilitating a seamless workflow. The chat chain acts as a facilitator, breaking down each stage into atomic subtasks. This enables dual roles, allowing for proposing and validating solutions through context-aware communication, leading to efficient resolution of specific subtasks. The instrumental analysis of ChatDev highlights its remarkable efficacy in software generation, enabling the completion of the entire software development process in under seven minutes at a cost of less than one dollar. It not only identifies and alleviates potential vulnerabilities but also rectifies potential hallucinations while maintaining commendable efficiency and cost-effectiveness. The potential of ChatDev unveils fresh possibilities for integrating LLMs into the realm of software development.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey on Large Language Model based Autonomous Agents
Lei Wang,Cheng-jian Ma,Xueyang Feng,Zeyu Zhang,Hao-ran Yang,Jingsen Zhang,Zhi-Yang Chen,Jiakai Tang,Xu Chen,Yankai Lin,Wayne Xin Zhao,Zhewei Wei,Ji-Rong Wen +12 more
TL;DR: A systematic review of the field of LLM-based autonomous agents from a holistic perspective, and proposes a unified framework that encompasses a majority of the previous work.
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Sirui Hong,Xiawu Zheng,Jonathan P. Chen,Yuheng Cheng,Ceyao Zhang,Zili Wang,Steven Ka Shing Yau,Z. Lin,Liyang Zhou,Chenyu Ran,Lingfeng Xiao,Chenglin Wu +11 more
TL;DR: MetaGPT is introduced, an innovative framework that incorporates efficient human workflows as a meta programming approach into LLM-based multi-agent collaboration and leverages the assembly line paradigm to assign diverse roles to various agents, thereby establishing a framework that can effectively and cohesively deconstruct complex multi- agent collaborative problems.
339
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan,Weize Chen,Yusheng Su,Jianxuan Yu,Wei Xue,Shan Zhang,Jie Fu,Zhiyuan Liu +7 more
TL;DR: A multi-agent referee team called ChatEval is constructed to autonomously discuss and evaluate the quality of generated responses from different models on open-ended questions and traditional natural language generation (NLG) tasks, offering a human-mimicking evaluation process for reliable assessments.
207
UltraFeedback: Boosting Language Models with High-quality Feedback
Ganqu Cui,Lifan Yuan,Ning Ding,Guanming Yao,Wei Zhu,Yuan Ni,Guotong Xie,Zhiyuan Liu,Maosong Sun +8 more
TL;DR: This study proposes ULTRAFEEDBACK, a large-scale, high-quality, and diversified preference dataset designed to overcome limitations and foster RLHF development, and trains various models to demonstrate its effectiveness.
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents
Weize Chen,Yusheng Su,Jingwei Zuo,Cheng Yang,Chenfei Yuan,Cheng Qian,Chi-Min Chan,Yujia Qin,Ya-Ting Lu,Ruobing Xie,Zhiyuan Liu,Maosong Sun,Jie Zhou +12 more
TL;DR: AgentVerse facilitates multi-agent collaboration and explores emergent behaviors in agents, enabling the creation of complex systems and scenarios.
References
•Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
- 28 May 2020
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Training language models to follow instructions with human feedback
Long Ouyang,Jeffrey Wu,Xu Jiang,Diogo Almeida,Carroll L. Wainwright,Pamela Mishkin,Chong Zhang,Sandhini Agarwal,Katarina Slama,Alex Ray,John Schulman,Jacob Hilton,Fraser Kelton,Luke E. Miller,Maddie Simens,Amanda Askell,Peter Welinder,Paul F. Christiano,Jan Leike,Ryan Lowe +19 more
- 04 Mar 2022
TL;DR: The results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent and showing improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets.
7.1K
Proceedings Article
Chain of Thought Prompting Elicits Reasoning in Large Language Models
Jason Loh Seong Wei,Xuezhi Wang,D. Schuurmans,Maarten Bosma,Ed H. Chi,Fei Xia,Quoc Le,Denny Zhou +7 more
- 28 Jan 2022
TL;DR: Experiments on three large language models show that chain-of-thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.
Hierarchical Text-Conditional Image Generation with CLIP Latents
TL;DR: This work proposes a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the imageembedding, and shows that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
4.3K
A Systematic Review of Software Development Cost Estimation Studies
Magne Jørgensen,Martin Shepperd +1 more
TL;DR: A systematic review of previous work identifies 304 software cost estimation papers in 76 journals and classifies the papers according to research topic, estimation approach, research approach, study context and data set to provide a basis for the improvement of software-estimation research.