James Bergstra of Harvard - Grid Search is a Bad Hyper-parameter Optimization Algorithm

Date
Location

ICCSX836

Abstract

Grid search and manual search are the most widely used strategies for hyper-parameter optimization. Manual search is well known to produce results that are difficult to reproduce. In this talk, I will argue that grid and manual search are inefficient and ineffective compared with alternatives based on Bayesian optimization, and even random search. I draw empirical support from a large previous study that used grid search and manual search to configure neural networks and Deep Belief Networks. Analysis of the response surface function from hyper-parameters to validation set performance reveals that for most data sets only a few of the hyper-parameters really matter, but critically, different hyper-parameters are important on different data sets. This property makes grid search a poor choice for configuring these algorithms for new data sets, and casts some light on why recent ``High Throughput'' methods based on random search achieve surprising success: they are not bogged down by irrelevant hyper-parameters. In cases where brute force methods are not sufficiently efficient, Bayesian optimization offers a principled, practical, and effective framework for search. I will present recent and ongoing work on improved model selection of Deep Belief Networks and multi-layer HT-L3 visual system models by Bayesian optimization.

Bio

James Bergstra is a Research Scholar at Harvard University working in David Cox's biological and computer vision group at the Rowland Institute. His research has focused by turns on visual system models and learning algorithms, hyper-parameter optimization, high performance computing, and music information retrieval. He completed doctoral studies at the University of Montreal in July 2011 under the direction of Professor Yoshua Bengio with a dissertation on how to incorporate complex cells into deep learning models. In the course of his graduate work he co-developed Theano, an open source optimizing compiler that can make use of Graphics Processing Units (GPUs) for high-performance computation. He completed a Masters in 2006 under the direction of Douglas Eck on algorithms for classifying recorded music by genre.