Samuel Müller
9 Papers
2 Citations
Samuel Müller is an academic researcher. The author has contributed to research in topics: Computer science & Byte pair encoding. The author has co-authored 1 publications.
Chat about Author
Papers
Proceedings Article
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Noah Hollmann,Samuel Müller,Katharina Eggensperger,Frank Hutter +3 more
- 05 Jul 2022
TL;DR: TabPFN is a trained Transformer that can do supervised classi-cation for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods.
LLMs for Semi-Automated Data Science: Introducing CAAFE for Context-Aware Automated Feature Engineering
TL;DR: Context-Aware Automated Feature Engineering (CAAFE) as discussed by the authors ) is a feature engineering method for tabular datasets that utilizes an LLM to iteratively generate additional semantically meaningful features based on the description of the dataset.
Meta-Learning a Real-Time Tabular AutoML Method For Small Data
TL;DR: TabPFN is presented, an AutoML method that is competitive with the state of the art on small tabular datasets while being over 1 000 × faster and performs on par with complex state-of-the-art AutoML systems with predictions produced in less than a second.
2
•Posted Content
Byte-Pair Encoding for Text-to-SQL Generation.
Samuel Müller,Andreas Vlachos +1 more
TL;DR: A novel stopping criterion is presented that prevents overfitting the BPE encoding to the training set and AST BPE is presented, which is a version of BPE that uses the Abstract Syntax Tree of the SQL statement to guide BPE merges and therefore produce BPE encodings that generalize better.
2
Bayesian Optimization with a Neural Network Meta-learned on Synthetic Data Only
TL;DR: Prior-data fitted networks (PFN) as discussed by the authors are neural networks that approximate the posterior predictive distribution (PPD) in a single forward pass, and they can approximate the PPD for any prior distribution that we can sample from efficiently.