Journal Article10.48550/arXiv.2209.14375
Improving alignment of dialogue agents via targeted human judgements
A. Glaese,Nathan McAleese,Maja Trkebacz,John Aslanides,Vlad Firoiu,Timo Ewalds,Maribeth Rauh,Laura Weidinger,Martin Chadwick,Phoebe Thacker,Lucy Campbell-Gillingham,Jonathan Uesato,Po-Sen Huang,Ramona Comanescu,Fan Yang,Abigail See,Sumanth Dathathri,Rory Greig,Charlie Chen,Doug Fritz,Jaume Sanchez Elias,Richard Green,Sona Mokra,Nicholas Fernando,Boxi Wu,Rachel Foley,Susannah Young,Iason Gabriel,William S. Isaac,John F. J. Mellor,Demis Hassabis,Koray Kavukcuoglu,Lisa Anne Hendricks,Geoffrey Irving +33 more
349
TL;DR: This research presents a state-of-the-art knowledge graph depicting the architecture of the connective tissue of the autonomic nervous system and some of the mechanisms responsible for seizure and depression are described.
read more
Abstract: We present Sparrow, an information-seeking dialogue agent trained to be more helpful, correct, and harmless compared to prompted language model baselines. We use reinforcement learning from human feedback to train our models with two new additions to help human raters judge agent behaviour. First, to make our agent more helpful and harmless, we break down the requirements for good dialogue into natural language rules the agent should follow, and ask raters about each rule separately. We demonstrate that this breakdown enables us to collect more targeted human judgements of agent behaviour and allows for more efficient rule-conditional reward models. Second, our agent provides evidence from sources supporting factual claims when collecting preference judgements over model statements. For factual questions, evidence provided by Sparrow supports the sampled response 78% of the time. Sparrow is preferred more often than baselines while being more resilient to adversarial probing by humans, violating our rules only 8% of the time when probed. Finally, we conduct extensive analyses showing that though our model learns to follow our rules it can exhibit distributional biases.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Taiming Lu,Lingfeng Shen,Xinyu Yang,Weiting Tan,Beidi Chen,Huaxiu Yao +5 more
- 12 Jun 2024
TL;DR: The interaction between policy and reward models in RLHF is intricate, and their seamlessness is crucial for optimal performance. However, current models exhibit a significant mismatch with human preferences, highlighting the need for improvements. To address this issue, we propose an automatic metric called SEAM to measure seamlessness and demonstrate its effectiveness in data selection and model augmentation.
Clinical Reading Comprehension with Encoder-Decoder Models Enhanced by Direct Preference Optimization
TL;DR: This paper combines encoder-decoder models with direct preference optimization to improve reading comprehension on the RadQA radiology question answering task, achieving a 12-15 F1 point gain over prior state-of-the-art, using novel heuristics to generate preference data without human input.
Augmenting Ad-Hoc IR Dataset for Interactive Conversational Search
Pierre Erbacher,Jian-Yun Nie,P. Preux,Laure Soulier +3 more
TL;DR: This paper shows the feasibility and utility of augmenting ad-hoc IR datasets for conversational IR with query clarification and answer simulations for MsMarco.
SOAP: Enhancing Efficiency of Generated Code via Self-Optimization
Hui Dong,Jianbo Dai,Han‐Rong Weng,Peng Wu,Yuhao Qing,Jie M. Zhang,Heming Cui,Zhiling Guo +7 more
- 23 May 2024
TL;DR: SOAP enhances the efficiency of LLM-generated code by iteratively optimizing code based on execution overhead profiles. It significantly reduces execution time and memory usage for various models.
Revision Transformers: Instructing Language Models to Change Their Values
Felix Friedrich,Wolfgang Stammer,Patrick Schramowski,Kristian Kersting +3 more
- 19 Oct 2022
TL;DR: This work questions the current common practice of storing all information in the model parameters and proposes the Revision Transformer (RiT) to facilitate easy model updating and pave the way for more transparent AI models.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
•Proceedings Article
Asynchronous methods for deep reinforcement learning
Volodymyr Mnih,Adrià Puigdomènech Badia,Mehdi Mirza,Alex Graves,Tim Harley,Timothy P. Lillicrap,David Silver,Koray Kavukcuoglu +7 more
- 19 Jun 2016
TL;DR: A conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers and shows that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
Logic and Conversation
Siobhan Chapman
- 01 Jan 2005
TL;DR: For instance, Grice was interested in Quine's logical approach to language, although he differed from Quine over certain specific specific questions, such as the viability of the distinction between analytic and synthetic statements.
8.9K
The case for motivated reasoning.
TL;DR: It is proposed that motivation may affect reasoning through reliance on a biased set of cognitive processes--that is, strategies for accessing, constructing, and evaluating beliefs--that are considered most likely to yield the desired conclusion.