Hyperparameter optimization

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•

Practical Bayesian Optimization of Machine Learning Algorithms

[...]

Jasper Snoek¹, Hugo Larochelle², Ryan P. Adams³•Institutions (3)

University of Toronto¹, Université de Sherbrooke², Harvard University³

3 Dec 2012

TL;DR: This work describes new algorithms that take into account the variable cost of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation and shows that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms.

...read moreread less

Abstract: The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expertlevel performance. We describe new algorithms that take into account the variable cost (duration) of learning algorithm experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization for many algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.

...read moreread less

7,983 citations

Posted Content•

Optuna: A Next-generation Hyperparameter Optimization Framework

[...]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama - Show less +1 more

25 Jul 2019-arXiv: Learning

TL;DR: New design-criteria for next-generation hyperparameter optimization software are introduced, including define-by-run API that allows users to construct the parameter search space dynamically, and easy-to-setup, versatile architecture that can be deployed for various purposes.

...read moreread less

Abstract: The purpose of this study is to introduce new design-criteria for next-generation hyperparameter optimization software. The criteria we propose include (1) define-by-run API that allows users to construct the parameter search space dynamically, (2) efficient implementation of both searching and pruning strategies, and (3) easy-to-setup, versatile architecture that can be deployed for various purposes, ranging from scalable distributed computing to light-weight experiment conducted via interactive interface. In order to prove our point, we will introduce Optuna, an optimization software which is a culmination of our effort in the development of a next generation optimization software. As an optimization software designed with define-by-run principle, Optuna is particularly the first of its kind. We will present the design-techniques that became necessary in the development of the software that meets the above criteria, and demonstrate the power of our new design through experimental results and real world applications. Our software is available under the MIT license (this https URL).

...read moreread less

3,936 citations

Proceedings Article•10.1145/3292500.3330701•

Optuna: A Next-generation Hyperparameter Optimization Framework

[...]

Takuya Akiba, Shotaro Sano, Toshihiko Yanase, Takeru Ohta, Masanori Koyama - Show less +1 more

25 Jul 2019

TL;DR: Optuna as mentioned in this paper is a next-generation hyperparameter optimization software with define-by-run (DBR) API that allows users to construct the parameter search space dynamically.

...read moreread less

3,462 citations

Proceedings Article•

Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures

[...]

James Bergstra¹, Daniel L. K. Yamins², David D. Cox¹•Institutions (2)

Harvard University¹, Massachusetts Institute of Technology²

16 Jun 2013

TL;DR: This work proposes a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process.

...read moreread less

Abstract: Many computer vision algorithms depend on configuration settings that are typically hand-tuned in the course of evaluating the algorithm for a particular data set. While such parameter tuning is often presented as being incidental to the algorithm, correctly setting these parameter choices is frequently critical to realizing a method's full potential. Compounding matters, these parameters often must be re-tuned when the algorithm is applied to a new problem domain, and the tuning process itself often depends on personal experience and intuition in ways that are hard to quantify or describe. Since the performance of a given technique depends on both the fundamental quality of the algorithm and the details of its tuning, it is sometimes difficult to know whether a given technique is genuinely better, or simply better tuned. In this work, we propose a meta-modeling approach to support automated hyperparameter optimization, with the goal of providing practical tools that replace hand-tuning with a reproducible and unbiased optimization process. Our approach is to expose the underlying expression graph of how a performance metric (e.g. classification accuracy on validation examples) is computed from hyperparameters that govern not only how individual processing steps are applied, but even which processing steps are included. A hyperparameter optimization algorithm transforms this graph into a program for optimizing that performance metric. Our approach yields state of the art results on three disparate computer vision problems: a face-matching verification task (LFW), a face identification task (PubFig83) and an object recognition task (CIFAR-10), using a single broad class of feed-forward vision architectures.

...read moreread less

2,335 citations

Journal Article•

Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

[...]

Sumio Watanabe¹•Institutions (1)

Tokyo Institute of Technology¹

01 Mar 2010-Journal of Machine Learning Research

TL;DR: In this article, the authors theoretically compare the Bayes cross-validation loss and the widely applicable information criterion and prove two theorems: 1) The Bayes generalization error is asymptotically equal to 2λ/n, where λ is the real log canonical threshold and n is the number of training samples.

...read moreread less

Abstract: In regular statistical models, the leave-one-out cross-validation is asymptotically equivalent to the Akaike information criterion. However, since many learning machines are singular statistical models, the asymptotic behavior of the cross-validation remains unknown. In previous studies, we established the singular learning theory and proposed a widely applicable information criterion, the expectation value of which is asymptotically equal to the average Bayes generalization loss. In the present paper, we theoretically compare the Bayes cross-validation loss and the widely applicable information criterion and prove two theorems. First, the Bayes cross-validation loss is asymptotically equivalent to the widely applicable information criterion as a random variable. Therefore, model selection and hyperparameter optimization using these two values are asymptotically equivalent. Second, the sum of the Bayes generalization error and the Bayes cross-validation error is asymptotically equal to 2λ/n, where λ is the real log canonical threshold and n is the number of training samples. Therefore the relation between the cross-validation error and the generalization error is determined by the algebraic geometrical structure of a learning machine. We also clarify that the deviance information criteria are different from the Bayes cross-validation and the widely applicable information criterion.

...read moreread less

2,145 citations

...

Expand

Year	Papers
2026	1
2025	81
2024	108
2023	323
2022	443
2021	519

Topic Tools

Papers published on a yearly basis

Papers

Practical Bayesian Optimization of Machine Learning Algorithms

Optuna: A Next-generation Hyperparameter Optimization Framework

Optuna: A Next-generation Hyperparameter Optimization Framework

Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures

Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

Related Topics (5)

Performance Metrics