Past Event: Oden Institute Seminar

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Michael Mahoney, Professor, ICSI and Department of Statistics, UC Berkeley

3:30 – 5PM
Tuesday Dec 8, 2020

Zoom Meeting

Abstract

Second order optimization algorithms have a long history in scientific computing, but they tend not to be used much in machine learning. This is in spite of the fact that they gracefully handle step size issues, poor conditioning problems, communication-computation tradeoffs, etc., all problems which are increasingly important in large-scale and high performance machine learning. A large part of the reason for this is that their implementation requires some care, e.g., a good implementation isn't possible in a few lines of python after taking a data science boot camp, and that a naive implementation typically performs worse than heavily parameterized/hyperparameterized stochastic first order methods. We describe ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the Hessian. ADAHESSIAN includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. Extensive tests on natural language processing, computer vision, and recommendation system tasks demonstrate that ADAHESSIAN achieves state-of-the-art results. The cost per iteration of ADAHESSIAN is comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values. Bio Michael W. Mahoney is a Professor at the University of California at Berkeley in the Department of Statistics and at the International Computer Science Institute (ICSI). He works on algorithmic and statistical aspects of modern large-scale data analysis. Much of his recent research has focused on large-scale machine learning, including randomized matrix algorithms and randomized numerical linear algebra, geometric network analysis tools for structure extraction in large informatics graphs, scalable implicit regularization methods, and applications in genetics, astronomy, medical imaging, social network analysis, and internet data analysis. He received his PhD from Yale University with a dissertation in computational statistical mechanics, and he has worked and taught at Yale University in the mathematics department, at Yahoo Research, and at Stanford University in the mathematics department. Among other things, he is on the national advisory committee of the Statistical and Applied Mathematical Sciences Institute (SAMSI), he was on the National Research Council's Committee on the Analysis of Massive Data, he co-organized the Simons Institute's fall 2013 and 2018 programs on the foundations of data science, and he runs the biennial MMDS Workshops on Algorithms for Modern Massive Data Sets. He is currently the Director of the NSF/TRIPODS-funded FODA (Foundations of Data Analysis) Institute at UC Berkeley. He holds several patents for work done at Yahoo Research and as Lead Data Scientist for Vieu Labs, Inc., a startup reimagining consumer video for billions of users. More information is available at https://www.stat.berkeley.edu/~mmahoney/.

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Event information

Date

3:30 – 5PM
Tuesday Dec 8, 2020

Location Zoom Meeting

Hosted by George Biros

Admin charlott@oden.utexas.edu