Seminar 217, Risk Management: Towards theoretical understanding of large batch training in stochastic gradient descent

Seminar 217, Risk Management: Towards theoretical understanding of large batch training in stochastic gradient descent

Risk Seminar
Sep 10, 2019, 11:00 AM - 12:30 PM | 1011 Evans Hall | Happening As Scheduled
Speaker: Xiaowu Dai, UC Berkeley (Speaker - Featured)
ABSTRACT: Stochastic gradient descent (SGD) is almost ubiquitously used in training non-convex optimization tasks. Recently, a hypothesis by Keskar et al. (2017) that large batch SGD tends to converge to sharp minima has received increasing attention. We justify this hypothesis by providing new properties of SGD in both finite-time and asymptotic regimes, using tools from Partial Differential...