Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at https://github.com/ekinakyurek/google-research/blob/master/incontext.

pdf/what-learning-algorithm-is-in-context-learning-ogur85us.pdf

What learning algorithm is in-context learning? Investigations with linear models

We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.

pdf/large-language-models-as-general-pattern-machines-38rbgktg.pdf

Large Language Models as General Pattern Machines

Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks. When such models are deployed in real world environments, they inevitably interface with other entities and agents. For example, language models are often used to interact with human beings through dialogue, and visual perception models are used to autonomously navigate neighborhood streets. In response to these developments, new paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning. These paradigms leverage the existence of ever-larger datasets curated for multimodal, multitask, and generalist interaction. Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems that can interact effectively across a diverse range of applications such as dialogue, autonomous driving, healthcare, education, and robotics. In this manuscript, we examine the scope of foundation models for decision making, and provide conceptual tools and technical background for understanding the problem space and exploring new research directions. We review recent approaches that ground foundation models in practical decision making applications through a variety of methods such as prompting, conditional generative modeling, planning, optimal control, and reinforcement learning, and discuss common challenges and open problems in the field.

pdf/foundation-models-for-decision-making-problems-methods-and-znjxc0t3.pdf

Foundation Models for Decision Making: Problems, Methods, and Opportunities

Learning from human preferences is important for language models to be helpful and useful for humans, and to align with human and social values. Prior work have achieved remarkable successes by learning from human feedback to understand and follow instructions. Nonetheless, these methods are either founded on hand-picked model generations that are favored by human annotators, rendering them ineffective in terms of data utilization and challenging to apply in general, or they depend on reward functions and reinforcement learning, which are prone to imperfect reward function and extremely challenging to optimize. In this work, we propose a novel technique, Chain of Hindsight, that is easy to optimize and can learn from any form of feedback, regardless of its polarity. Our idea is inspired by how humans learn from extensive feedback presented in the form of languages. We convert all types of feedback into sentences, which are then used to fine-tune the model, allowing us to take advantage of the language comprehension capabilities of language models. We condition the model on a sequence of model generations paired with feedback. By doing so, models are trained to generate outputs based on feedback, and models can learn to identify and correct negative attributes or errors. Applying our method to large language models, we observed that Chain of Hindsight significantly surpasses previous methods in aligning language models with human preferences. We observed significant improvements on summarization and dialogue tasks and our approach is markedly preferred in human evaluations.

pdf/chain-of-hindsight-aligns-language-models-with-feedback-1afx9k00.pdf

Chain of Hindsight Aligns Language Models with Feedback

It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model. 

pdf/language-modeling-is-compression-1opipimei3.pdf

Language Modeling Is Compression

We introduce Genie, the first generative interactive environment trained in an unsupervised manner from unlabelled Internet videos. The model can be prompted to generate an endless variety of action-controllable virtual worlds described through text, synthetic images, photographs, and even sketches. At 11B parameters, Genie can be considered a foundation world model. It is comprised of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model. Genie enables users to act in the generated environments on a frame-by-frame basis despite training without any ground-truth action labels or other domain-specific requirements typically found in the world model literature. Further the resulting learned latent action space facilitates training agents to imitate behaviors from unseen videos, opening the path for training generalist agents of the future.

Genie: Generative Interactive Environments

Building generalist agents that can accomplish many goals in rich open-ended environments is one of the research frontiers for reinforcement learning. A key limiting factor for building generalist agents with RL has been the need for a large number of reward functions for achieving different goals. We investigate the feasibility of using off-the-shelf vision-language models, or VLMs, as sources of rewards for reinforcement learning agents. We show how rewards for visual achievement of a variety of language goals can be derived from the CLIP family of models, and used to train RL agents that can achieve a variety of language goals. We showcase this approach in two distinct visual domains and present a scaling trend showing how larger VLMs lead to more accurate rewards for visual goal achievement, which in turn produces more capable RL agents.

Vision-Language Models as a Source of Rewards

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model. Algorithm Distillation treats learning to reinforcement learn as an across-episode sequential prediction problem. A dataset of learning histories is generated by a source RL algorithm, and then a causal transformer is trained by autoregressively predicting actions given their preceding learning histories as context. Unlike sequential policy prediction architectures that distill post-learning or expert sequences, AD is able to improve its policy entirely in-context without updating its network parameters. We demonstrate that AD can reinforcement learn in-context in a variety of environments with sparse rewards, combinatorial task structure, and pixel-based observations, and find that AD learns a more data-efficient RL algorithm than the one that generated the source data.

In-context Reinforcement Learning with Algorithm Distillation

Handle everyday research tasks with reliable, citation-backed results

Your personal Research Agent to handle research tasks with citation-backed results

Popular Tasks used by Researchers

How can I help with your research?

Meet SciSpace

Get more enhanced response by uploading the PDFs you want me to reference.

No relevant PDFs in your library

SciSpace is the AI research assistant for academics. Run systematic literature reviews on 280M+ papers, and write papers with cited sources. Trusted by 1M+ students, PhDs & researchers.

SciSpace | AI for Research

Analyze PDFs

Code & Manuscripts

Funding & Grants

Literature & Patents

Medical & Clinical Data

Systematic Review

Visualize & Present

Web & Data

Build a Google Scholar-like website for your research.

Build a website

Create charts and images for your research

Create a Chart

Write a paper for submission to a journal

Draft a manuscript

Patent Search

Design eye-catching scientific posters in minutes.

Scientific Poster Generation

Systematic Literature Review

One task is running at the moment. Your messages will be shown right after.

Drag and drop or click here to browse

Loved by <highlight>1 million+</highlight> researchers

Extract a list of specific topics and their sources from unstructured text

Topics

Compare and analyze relevant papers that matches with your search

Papers

Get insights from PDFs and bookmarked papers from your library

My library

Recent searches

Try searching for:

Catch AI-generated content in scholarly and non-scholarly content

{ai} Detector

Ai Writer

Get PDF Summaries, highlighted text explanations 

Chat with PDF

Effortlessly create in-text citations and bibliographies in APA and 2,500 other formats

Citation generator

Get explanations, summaries, and answers on academic papers

Ease up your research workflow with {scispace}'s cohort of exciting AI tools

Elevate your academic writing skills and convey your ideas the way you want

Paraphraser

Explore our range of reading and writing tools

Your file is being prepared and should be ready in a few minutes. If it's a large file, it might take a bit longer. You can close this window, and we'll email you the file when it's done.

You have reached a maximum limit of <strong>{limit}</strong> columns in the table. Remove at least <strong>1</strong> column to add or create another one.

Richie Steigerwald

Author Tools

Chat about Author

Papers

Genie: Generative Interactive Environments

Vision-Language Models as a Source of Rewards

In-context Reinforcement Learning with Algorithm Distillation