Journal Article10.48550/arXiv.2303.17760
CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society
TL;DR: RoleRole-playing as discussed by the authors is a role-playing framework to guide chat agents toward task completion while maintaining consistency with human intentions, and it can be used to generate conversational data for studying the behaviors and capabilities of chat agents.
read more
Abstract: The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their"cognitive"processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. The GitHub repository of this project is made publicly available on: https://github.com/lightaime/camel.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 6: Challenges in Cooperative Role-Playing. Our analysis of our generated sets revealed four main challenges, namely, role flipping, assistant repeats instruction, flake replies and infinite conversation. 
Figure 10: AI Society Instructions Information Cartography. The information cartography for the instructions generated in the AI Society dataset reveals coverage of multiple diverse topics. The map was generated using Nomic Atlas. 
Figure 2: Inception Prompt of AI Society Role-Playing. This shows the task specifier prompt, assistant system prompt, and user system prompt which are used for studying the AI society scenario. 
Figure 5: Generated Meta Data. The meta data generated by LLMs for AI Society and Code datasets. 50 assistant roles and 50 user role are generated for AI Society. 20 programming languages and 50 domains are generated for Code. 
Figure 9: Flake Message Distribution (AI Society). We quantify and visualize the number of flake messages, i.e. ones that start with “I will ...” and do not progress towards task completion. Our original prompt shows the least amount of flake messages compared to both presented ablations. 
Figure 13: Code Tasks Information Cartography. The information cartography for the tasks generated in the AI Society dataset reveals coverage of multiple diverse topics. The map was generated using Nomic Atlas.
Citations
A Survey on Large Language Model based Autonomous Agents
Lei Wang,Cheng-jian Ma,Xueyang Feng,Zeyu Zhang,Hao-ran Yang,Jingsen Zhang,Zhi-Yang Chen,Jiakai Tang,Xu Chen,Yankai Lin,Wayne Xin Zhao,Zhewei Wei,Ji-Rong Wen +12 more
TL;DR: A systematic review of the field of LLM-based autonomous agents from a holistic perspective, and proposes a unified framework that encompasses a majority of the previous work.
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Sirui Hong,Xiawu Zheng,Jonathan P. Chen,Yuheng Cheng,Ceyao Zhang,Zili Wang,Steven Ka Shing Yau,Z. Lin,Liyang Zhou,Chenyu Ran,Lingfeng Xiao,Chenglin Wu +11 more
TL;DR: MetaGPT is introduced, an innovative framework that incorporates efficient human workflows as a meta programming approach into LLM-based multi-agent collaboration and leverages the assembly line paradigm to assign diverse roles to various agents, thereby establishing a framework that can effectively and cohesively deconstruct complex multi- agent collaborative problems.
339
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
TL;DR: UltraChat as mentioned in this paper is a large-scale dataset of instructional conversations, which does not involve human queries and contains 1.5 million high-quality multi-turn dialogues and covers a wide range of topics and instructions.
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan,Weize Chen,Yusheng Su,Jianxuan Yu,Wei Xue,Shan Zhang,Jie Fu,Zhiyuan Liu +7 more
TL;DR: A multi-agent referee team called ChatEval is constructed to autonomously discuss and evaluate the quality of generated responses from different models on open-ended questions and traditional natural language generation (NLG) tasks, offering a human-mimicking evaluation process for reliable assessments.
207
Aligning Large Language Models with Human: A Survey
Yufei Wang,Wanjun Zhong,Liangyou Li,Fei Mi,Xingshan Zeng,Wenyong Huang,Lifeng Shang,Xin Jiang,Qun Liu +8 more
TL;DR: This survey presents a comprehensive overview of these human-aligned LLMs, shedding light on several promising future research avenues in the field and serves as a valuable resource for anyone invested in understanding and advancing the alignment of LLMs to better suit human-oriented tasks and expectations.
194
References
•Proceedings Article
The dynamics of reinforcement learning in cooperative multiagent systems
Caroline Claus,Craig Boutilier +1 more
- 01 Jul 1998
TL;DR: This work distinguishes reinforcement learners that are unaware of (or ignore) the presence of other agents from those that explicitly attempt to learn the value of joint actions and the strategies of their counterparts, and proposes alternative optimistic exploration strategies that increase the likelihood of convergence to an optimal equilibrium.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai,Andy Jones,Kamal K. Ndousse,Amanda Askell,Anna Chen,Nova DasSarma,Dawn Drain,Stanislav Fort,Deep Ganguli,Tom Henighan,Nicholas Joseph,Saurav Kadavath,John Kernion,Tom Conerly,Sheer El-Showk,Nelson Elhage,Zac Hatfield-Dodds,Danny Hernandez,Tristan Hume,Scott Johnston,S. M. Kravec,Liane Lovitt,Neel Nanda,Catherine Anne White Olsson,Dario Amodei,Tom B. Brown,Jack Clark,Samuel McCandlish,Chris Olah,Benjamin Mann,Jared Kaplan +30 more
TL;DR: An iterated online mode of training, where preference models and RL policies are updated on a weekly cadence with fresh human feedback data, and a roughly linear relation between the RL reward and the square root of the KL divergence between the policy and its initialization is identified.
1.2K
Training Compute-Optimal Large Language Models
Jordan Hoffmann,Sebastian Borgeaud,Arthur Mensch,Elena Buchatskaya,Trevor Cai,Eliza Rutherford,Diego de Las Casas,Lisa Anne Hendricks,Johannes Welbl,Aidan Clark,Tom Hennigan,Eric Noland,Katie Millican,George van den Driessche,Bogdan Damoc,Aurelia Guy,Simon Osindero,Karen Simonyan,Erich Elsen,Jack W. Rae,Oriol Vinyals,Laurent Sifre +21 more
TL;DR: This paper trains a predicted compute-optimal model, Chinchilla, that uses the same compute budget as Gopher but with 70B parameters and 4 × more more data, and reaches a state-of-the-art average accuracy on the MMLU benchmark.
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang,Yeganeh Kordi,Swaroop Mishra,Alisa Liu,Noah A. Smith,Daniel Khashabi,Hannaneh Hajishirzi +6 more
- 20 Dec 2022
TL;DR: The authors propose Self-Instruct, a framework for improving the instruction-following capabilities of pre-trained language models by bootstrapping off their own generations, which generates instructions, input, and output samples from a language model, then filters invalid or similar ones before using them to finetune the original model.
•Posted Content
Extracting Training Data from Large Language Models
Nicholas Carlini,Florian Tramèr,Eric Wallace,Matthew Jagielski,Ariel Herbert-Voss,Katherine Lee,Adam Roberts,Tom B. Brown,Dawn Song,Úlfar Erlingsson,Alina Oprea,Colin Raffel +11 more
TL;DR: This paper demonstrates that in such settings, an adversary can perform a training data extraction attack to recover individual training examples by querying the language model, and finds that larger models are more vulnerable than smaller models.