BriefGPT.xyz
Jun, 2020
带有规范化层学习的球形透视
Spherical Perspective on Learning with Batch Norm
HTML
PDF
Simon Roburin, Yann de Mont-Marin, Andrei Bursuc, Renaud Marlet, Patrick Pérez...
TL;DR
本文介绍了一个用几何角度来研究具有Normalization Layers的神经网络优化的球形框架,首先得出了Adam的第一个有效学习率表达式,并表明在存在NLs的情况下,仅执行SGD实际上等效于限制在单位超球面上的Adam变体,最后通过实验证实了之前Adam的变体对优化过程的影响。
Abstract
Batch Normalization (BN) is a prominent deep learning technique. In spite of its apparent simplicity, its implications over
optimization
are yet to be fully understood. In this paper, we study the
optimization
of
→