BriefGPT.xyz
Jun, 2020
预处理何时有助于或损害泛化能力?
When Does Preconditioning Help or Hurt Generalization?
HTML
PDF
Shun-ichi Amari, Jimmy Ba, Roger Grosse, Xuechen Li, Atsushi Nitanda...
TL;DR
此研究探讨一些最优化方法(包括用于神经网络的一阶梯度下降和二阶自然梯度下降)的隐式偏差如何影响其泛化性能,并提出了管理偏差方差的几种方法及在回归问题中的应用。
Abstract
While second order optimizers such as
natural gradient descent
(NGD) often speed up
optimization
, their effect on
generalization
remains c
→