Optuna: A Next-generation Hyperparameter Optimization Framework

Open AccessPosted Content

Optuna: A Next-generation Hyperparameter Optimization Framework

- 25 Jul 2019

3.9K

TL;DR: New design-criteria for next-generation hyperparameter optimization software are introduced, including define-by-run API that allows users to construct the parameter search space dynamically, and easy-to-setup, versatile architecture that can be deployed for various purposes.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 8: Optuna dashboard. This example shows the online transition of objective values, the parallel coordinates plot of sampled parameters, the learning curves, and the tabular descriptions of investigated trials.

Figure 7: Distributed optimization in Optuna. Figure (a) is the optimization script executed by one worker. Figure (b) is an example shell for the optimization with multiple workers in a distributed environment.

Table 1: Software frameworks for deep learning and hyperparameter optimization, sorted by their API styles: define-and-run and define-by-run.

Table 2: Comparison of previous hyperparameter optimization frameworks and Optuna. There is a checkmark for lightweight if the setup for the framework is easy and it can be easily used for lightweight purposes.

Figure 12: Distributed hyperparameter optimization process for the minimization of average test errors of simplified AlexNet for SVHN dataset. The optimization was done with ASHA pruning.

Figure 11: The transition of average test errors of simplified AlexNet for SVHN dataset. Figure (a) illustrates the effect of pruning mechanisms on TPE and random search. Figure (b) illustrates the effect of the number of workers on the performance. Figure (c) plots the test errors against the number of trials for different number of workers. Note that the number of workers has no effect on the relation between the number of executed trials and the test error. The result also shows the superiority of ASHA pruning over median pruning.

Citations

•Journal Article•10.1145/355598.362773

Transition network grammars for natural language analysis

William A. Woods

- 01 Oct 1970

- Communications of The ACM

TL;DR: The use of augmented transition network grammars for the analysis of natural language sentences is described, and structure-building actions associated with the arcs of the grammar network allow for a powerful selectivity which can rule out meaningless analyses and take advantage of semantic information to guide the parsing.

...read moreread less

1.4K

•Journal Article•10.1038/s41586-021-04223-6

Deep physical neural networks trained with backpropagation

Logan G. Wright, +6 more

- 01 Jan 2022

- Visual education

TL;DR: Physical Neural Networks as discussed by the authors automatically train the functionality of any sequence of real physical systems, directly, using backpropagation, the same technique used for modern deep neural networks, using three diverse physical systems-optical, mechanical, and electrical.

...read moreread less

459

•Posted Content•10.1101/2020.05.10.20097469

Covasim: an agent-based model of COVID-19 dynamics and interventions

Cliff C. Kerr, +23 more

- 15 May 2020

- medRxiv

TL;DR: The methodology of Covasim (COVID-19 Agent-based Simulator), an open-source model developed to help address the urgent need for models that can project epidemic trends, explore intervention scenarios, and estimate resource needs, is described.

...read moreread less

444

•Posted Content

Data Augmentation for Graph Neural Networks

Tong Zhao, +5 more

- 11 Jun 2020

- arXiv: Learning

TL;DR: This work shows that neural edge predictors can effectively encode class-homophilic structure to promote intra- class edges and demote inter-class edges in given graph structure, and introduces the GAug graph data augmentation framework, which leverages these insights to improve performance in GNN-based node classification via edge prediction.

...read moreread less

421

•Posted Content

A System for Massively Parallel Hyperparameter Tuning

Liam Li, +6 more

- 13 Oct 2018

- arXiv: Learning

TL;DR: This work introduces a simple and robust hyperparameter optimization algorithm called ASHA, which exploits parallelism and aggressive early-stopping to tackle large-scale hyperparameters optimization problems, and shows that ASHA outperforms existing state-of-the-art hyper parameter optimization methods.

...read moreread less

300

...

Expand

References

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Automatic differentiation in PyTorch

Adam Paszke, +9 more

- 28 Oct 2017

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less

17.1K