TL;DR: This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid, and shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper- parameter optimization algorithms.
Abstract: Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven data sets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different data sets. This phenomenon makes grid search a poor choice for configuring algorithms for new data sets. Our analysis casts some light on why recent "High Throughput" methods achieve surprising success--they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.
TL;DR: It is shown that all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions, which allows for mathematical benchmarks for assessing a particular search algorithm's performance.
Abstract: We show that all algorithms that search for an extremum of a cost function perform exactly the same, when averaged over all possible cost functions. In particular, if algorithm A outperforms algorithm B on some cost functions, then loosely speaking there must exist exactly as many other functions where B outperforms A. Starting from this we analyze a number of the other a priori characteristics of the search problem, like its geometry and its information-theoretic aspects. This analysis allows us to derive mathematical benchmarks for assessing a particular search algorithm's performance. We also investigate minimax aspects of the search problem, the validity of using characteristics of a partial search over a cost function to predict future behavior of the search algorithm on that cost function, and time-varying cost functions. We conclude with some discussion of the justifiablility of biologically-inspired search methods.
TL;DR: Analysis shows that a speed improvement rate of the hexagon-based search (HEXBS) algorithm over the diamond search (DS) algorithm can be over 80% for locating some motion vectors in certain scenarios.
Abstract: In block motion estimation, a search pattern with a different shape or size has a very important impact on search speed and distortion performance. A square-shaped search pattern is adopted in many popular fast algorithms. Recently, a diamond-shaped search pattern was introduced in fast block motion estimation and has exhibited a faster search speed. Based on an in-depth examination of the influence of the search pattern on speed performance, we propose a novel algorithm using a hexagon-based search pattern to achieve further improvement. The hexagon-based search pattern is investigated in comparison with diamond search pattern and demonstrates significant speedup gain over the diamond-based search. Analysis shows that a speed improvement rate of the hexagon-based search (HEXBS) algorithm over the diamond search (DS) algorithm can be over 80% for locating some motion vectors in certain scenarios. In short, the proposed HEXBS algorithm can find the same motion vector with fewer search points than the DS algorithm. Generally speaking, the larger the motion vector, the more search points the. HEXBS algorithm can save, which is further justified by experimental results.
TL;DR: A new suboptimal search strategy for feature selection that represents a more sophisticated version of “classical” floating search algorithms and facilitates finding a solution even closer to the optimal one.