Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples $(x, f(x))$ presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners' late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations are released at https://github.com/ekinakyurek/google-research/blob/master/incontext.

pdf/what-learning-algorithm-is-in-context-learning-ogur85us.pdf

What learning algorithm is in-context learning? Investigations with linear models

In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries.

Neural Architecture Search: Insights from 1000 Papers

Tabular data is one of the most commonly used types of data in machine learning. Despite recent advances in neural nets (NNs) for tabular data, there is still an active discussion on whether or not NNs generally outperform gradient-boosted decision trees (GBDTs) on tabular data, with several recent works arguing either that GBDTs consistently outperform NNs on tabular data, or vice versa. In this work, we take a step back and ask, 'does it matter?' We conduct the largest tabular data analysis to date, by comparing 19 algorithms across 176 datasets, and we find that the 'NN vs. GBDT' debate is overemphasized: for a surprisingly high number of datasets, either the performance difference between GBDTs and NNs is negligible, or light hyperparameter tuning on a GBDT is more important than selecting the best algorithm. Next, we analyze 965 metafeatures to determine what properties of a dataset make NNs or GBDTs better-suited to perform well. For example, we find that GBDTs are much better than NNs at handling skewed feature distributions, heavy-tailed feature distributions, and other forms of dataset irregularities. Our insights act as a guide for practitioners to decide whether or not they need to run a neural net to reach top performance on their dataset. Our codebase and all raw results are available at https://github.com/naszilla/tabzilla.

pdf/when-do-neural-nets-outperform-boosted-trees-on-tabular-data-3iqlbiuc.pdf

When Do Neural Nets Outperform Boosted Trees on Tabular Data?

Building general-purpose robots that can operate seamlessly, in any environment, with any object, and utilizing various skills to complete diverse tasks has been a long-standing goal in Artificial Intelligence. Unfortunately, however, most existing robotic systems have been constrained - having been designed for specific tasks, trained on specific datasets, and deployed within specific environments. These systems usually require extensively-labeled data, rely on task-specific models, have numerous generalization issues when deployed in real-world scenarios, and struggle to remain robust to distribution shifts. Motivated by the impressive open-set performance and content generation capabilities of web-scale, large-capacity pre-trained models (i.e., foundation models) in research fields such as Natural Language Processing (NLP) and Computer Vision (CV), we devote this survey to exploring (i) how these existing foundation models from NLP and CV can be applied to the field of robotics, and also exploring (ii) what a robotics-specific foundation model would look like. We begin by providing an overview of what constitutes a conventional robotic system and the fundamental barriers to making it universally applicable. Next, we establish a taxonomy to discuss current work exploring ways to leverage existing foundation models for robotics and develop ones catered to robotics. Finally, we discuss key challenges and promising future directions in using foundation models for enabling general-purpose robotic systems. We encourage readers to view our living GitHub repository of resources, including papers reviewed in this survey as well as related projects and repositories for developing foundation models for robotics.

Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis

Chemical modulation of proteins enables a mechanistic understanding of biology and represents the foundation of most therapeutics. However, despite decades of research, 80% of the human proteome lacks functional ligands. Chemical proteomics has advanced fragment-based ligand discovery toward cellular systems, but throughput limitations have stymied the scalable identification of fragment-protein interactions. We report proteome-wide maps of protein-binding propensity for 407 structurally diverse small-molecule fragments. We verified that identified interactions can be advanced to active chemical probes of E3 ubiquitin ligases, transporters, and kinases. Integrating machine learning binary classifiers further enabled interpretable predictions of fragment behavior in cells. The resulting resource of fragment-protein interactions and predictive models will help to elucidate principles of molecular recognition and expedite ligand discovery efforts for hitherto undrugged proteins. Editor’s summary Despite major advances in protein structure determination and prediction, it remains challenging to identify small-molecule ligands for most proteins. Offensperger et al. used a chemical proteomics approach to map protein-ligand interactions across the human proteome. With a library of around 400 ligand fragments attached to a photoactivatable cross-linker, the authors identified about 50,000 statistically significant interactions over about 2500 proteins, including a large fraction of targets for which there are no prior known ligands. They validated these results with biochemical experiments, identifying from their screen an E3 ligase binder and an inhibitor of a transmembrane transporter. This rich interaction dataset also provided the basis for developing a machine learning model to predict fragment properties and interaction profiles. The authors have made their work available as an online community resource. —Michael A. Funk INTRODUCTION Chemical modulation of protein function is an important experimental approach to illuminate biological mechanisms and represents the most frequently used strategy to treat human disease. Nevertheless, around 80% of the human proteome lacks annotated small-molecule ligands, thus leaving many proteins, including validated disease targets, outside the reach of mechanistic elucidation and therapeutic innovation. RATIONALE To close this gap, unbiased approaches to advance ligand discovery are urgently needed. We set out to determine the proteome-wide binding preferences of more than 400 small-molecule fragments through a chemoproteomics strategy that is based on treatment of intact cells. With these data at hand, we aimed to (i) identify hundreds of fragment-protein interactions and advance selected fragments toward cell-active ligands, (ii) leverage machine learning (ML) binary classifiers to develop models to predict small-molecule behavior in native biological systems, and (iii) build an interactive open-source interface to empower the broad exploration of the data and of all predictive models. RESULTS Through this quantitative chemoproteomics strategy, we experimentally determined the interactome of 407 small-molecule fragments. This led to the identification of 47,658 discrete fragment-protein interactions involving more than 2600 proteins, of which 86% previously lacked any annotated ligand. To provide evidence for the translational potential of these starting points, we advanced various hits toward elaborated fragments. With focused synthetic efforts, we developed ligands that (i) engage the E3 ligase adaptor protein DDB1, (ii) functionally block the human equilibrative nucleoside transporter SLC29A1 (hENT1), or (iii) selectively inhibit a subset of cyclin dependent kinases (CDKs), including the orphan CDK16. In addition to advancing individual fragment-protein hits, we leveraged the depth of the global dataset to develop an ML framework to build models that can predict how fragments interact with native proteins on a proteome-wide scale. This framework included inference of quantitative fragment interactomes, which enabled us to predict to how many proteins a given fragment will bind and whether the bound proteins themselves are chemically broadly accessible or otherwise typically refractory to small-molecule ligands. Moreover, ML models allowed us to capture and predict qualitative interactome signatures. This made it possible for us to investigate and predict whether fragments tend to interact with subsets of proteins of coherent function, such as transporters or RNA-binding proteins. Likewise, ML models allowed us to analyze and predict whether fragments tend to interact with groups of proteins that reside in defined subcellular localizations or compartments, such as lysosomes or mitochondria, which can be indicative of intracellular fragment partitioning and accumulation. Last, we have also provided a platform to develop bespoke ML models that are based on a user-defined input of target proteins, and hence enable the prediction of fragment binding to a custom set of proteins. CONCLUSION Our large-scale chemical proteomics survey led to the identification of hundreds of fragment-protein interactions that are poised for future exploration and chemical optimization. Moreover, we found that the generated data is amenable to ML-based models that enable us to predict how chemical matter interacts with native proteomes in intact cells by using their chemical structure as input. To maximize the practical use for the scientific community, all interactomes, enrichment tools, and ML models have been made publicly available for exploration through a web-based application (https://ligand-discovery.ai). Collectively, these data and tools should form a resource to interpret fragment-binding data and expedite ligand discovery efforts. Schematic representation of the ligand discovery approach. Chemoproteomics was used to assess 407 small-molecule fragments. Hundreds of fragment-protein interactions were identified as starting points for probe development. System-level analyses coupled to machine learning enabled prediction of fragment binding and behavior in living cells. An interactive web resource has been provided for data exploration, which also allows the generation and application of bespoke predictive models.

Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells

We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. This prior incorporates ideas from causal reasoning: It entails a large space of structural causal models with a preference for simple structures. On the 18 datasets in the OpenML-CC18 suite that contain up to 1 000 training data points, up to 100 purely numerical features without missing values, and up to 10 classes, we show that our method clearly outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with up to 70$\times$ speedup. This increases to a 3200$\times$ speedup when a GPU is available. We also validate these results on an additional 67 small numerical datasets from OpenML. We provide all our code, the trained TabPFN, an interactive browser demo and a Colab notebook at https://github.com/automl/TabPFN.

pdf/tabpfn-a-transformer-that-solves-small-tabular-2g62r3fd.pdf

TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second

As the field of automated machine learning (AutoML) advances, it becomes increasingly important to incorporate domain knowledge into these systems. We present an approach for doing so by harnessing the power of large language models (LLMs). Specifically, we introduce Context-Aware Automated Feature Engineering (CAAFE), a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features for tabular datasets based on the description of the dataset. The method produces both Python code for creating new features and explanations for the utility of the generated features. Despite being methodologically simple, CAAFE improves performance on 11 out of 14 datasets - boosting mean ROC AUC performance from 0.798 to 0.822 across all dataset - similar to the improvement achieved by using a random forest instead of logistic regression on our datasets. Furthermore, CAAFE is interpretable by providing a textual explanation for each generated feature. CAAFE paves the way for more extensive semi-automation in data science tasks and emphasizes the significance of context-aware solutions that can extend the scope of AutoML systems to semantic AutoML. We release our $\href{https://github.com/automl/CAAFE}{code}$, a simple $\href{https://colab.research.google.com/drive/1mCA8xOAJZ4MaB_alZvyARTMjhl6RZf0a}{demo}$ and a $\href{https://pypi.org/project/caafe/}{python\ package}$.

pdf/llms-for-semi-automated-data-science-introducing-caafe-for-2qf2huux.pdf

LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering

We present TabPFN, an AutoML method that is competitive with the state of the art on small tabular datasets while being over 1 000 × faster. Our method is very simple: it is fully entailed in the weights of a single neural network, and a single forward pass directly yields predictions for a new dataset. Our AutoML method is meta-learned using the Transformer-based Prior-Data Fitted Network (PFN) architecture and approximates Bayesian inference with a prior that is based on assumptions of simplicity and causal structures. The prior contains a large space of structural causal models and Bayesian neural networks with a bias for small architectures and thus low complexity. Furthermore, we extend the PFN approach to differentiably calibrate the prior’s hyperparameters on real data. By doing so, we separate our abstract prior assumptions from their heuristic calibration on real data. Afterwards, the calibrated hyperparameters are ﬁxed and TabPFN can be applied to any new tabular dataset at the push of a button. Finally, on 30 datasets from the OpenML-CC18 suite we show that our method outperforms boosted trees and performs on par with complex state-of-the-art AutoML systems with predictions produced in less than a second. We provide all our code and our ﬁnal trained TabPFN in the supplementary materials.

Meta-Learning a Real-Time Tabular AutoML Method For Small Data

Neural sequence-to-sequence models provide a competitive approach to the task of mapping a question in natural language to an SQL query, also referred to as text-to-SQL generation. The Byte-Pair Encoding algorithm (BPE) has previously been used to improve machine translation (MT) between natural languages. In this work, we adapt BPE for text-to-SQL generation. As the datasets for this task are rather small compared to MT, we present a novel stopping criterion that prevents overfitting the BPE encoding to the training set. Additionally, we present AST BPE, which is a version of BPE that uses the Abstract Syntax Tree (AST) of the SQL statement to guide BPE merges and therefore produce BPE encodings that generalize better. We improved the accuracy of a strong attentive seq2seq baseline on five out of six English text-to-SQL tasks while reducing training time by more than 50% on four of them due to the shortened targets. Finally, on two of these tasks we exceeded previously reported accuracies.

Byte-Pair Encoding for Text-to-SQL Generation.

Bayesian Optimization (BO) is an effective approach to optimize black-box functions, relying on a probabilistic surrogate to model the response surface. In this work, we propose to use a Prior-data Fitted Network (PFN) as a cheap and flexible surrogate. PFNs are neural networks that approximate the Posterior Predictive Distribution (PPD) in a single forward-pass. Most importantly, they can approximate the PPD for any prior distribution that we can sample from efficiently. Addition-ally, we show what is required for PFNs to be used in a standard BO setting with common acquisition functions. We evaluated the performance of a PFN surrogate for Hyperparameter optimization (HPO), a major application of BO. While the method can still fail for some search spaces, we fare comparable or better than the state-of-the-art on the HPO-B and PD1 benchmark.

Bayesian Optimization with a Neural Network Meta-learned on Synthetic Data Only

Handle everyday research tasks with reliable, citation-backed results

Your personal Research Agent to handle research tasks with citation-backed results

Popular Tasks used by Researchers

How can I help with your research?

Meet SciSpace

Get more enhanced response by uploading the PDFs you want me to reference.

No relevant PDFs in your library

SciSpace is the AI research assistant for academics. Run systematic literature reviews on 280M+ papers, and write papers with cited sources. Trusted by 1M+ students, PhDs & researchers.

SciSpace | AI for Research

Analyze PDFs

Code & Manuscripts

Funding & Grants

Literature & Patents

Medical & Clinical Data

Systematic Review

Visualize & Present

Web & Data

Build a Google Scholar-like website for your research.

Build a website

Create charts and images for your research

Create a Chart

Write a paper for submission to a journal

Draft a manuscript

Patent Search

Design eye-catching scientific posters in minutes.

Scientific Poster Generation

Systematic Literature Review

One task is running at the moment. Your messages will be shown right after.

Drag and drop or click here to browse

Loved by <highlight>1 million+</highlight> researchers

Extract a list of specific topics and their sources from unstructured text

Topics

Compare and analyze relevant papers that matches with your search

Papers

Get insights from PDFs and bookmarked papers from your library

My library

Recent searches

Try searching for:

Catch AI-generated content in scholarly and non-scholarly content

{ai} Detector

Ai Writer

Get PDF Summaries, highlighted text explanations 

Chat with PDF

Effortlessly create in-text citations and bibliographies in APA and 2,500 other formats

Citation generator

Get explanations, summaries, and answers on academic papers

Ease up your research workflow with {scispace}'s cohort of exciting AI tools

Elevate your academic writing skills and convey your ideas the way you want

Paraphraser

Explore our range of reading and writing tools

Your file is being prepared and should be ready in a few minutes. If it's a large file, it might take a bit longer. You can close this window, and we'll email you the file when it's done.

You have reached a maximum limit of <strong>{limit}</strong> columns in the table. Remove at least <strong>1</strong> column to add or create another one.

Samuel Müller

Author Tools

Chat about Author