BriefGPT.xyz
Mar, 2018
平均权重导致更宽的随机局部极小值和更好的泛化
Averaging Weights Leads to Wider Optima and Better Generalization
HTML
PDF
Pavel Izmailov, Dmitrii Podoprikhin, Timur Garipov, Dmitry Vetrov, Andrew Gordon Wilson
TL;DR
通过在SGD轨迹上抽样多个点进行简单平均,Stochastic Weight Averaging(SWA)过程实现了比传统训练更好的泛化,SWA获得了CIFAR-10、CIFAR-100和ImageNet上多个最先进网络的显着测试精度提高,而且SWA实现简单、无几乎不需要计算成本。
Abstract
deep neural networks
are typically trained by optimizing a loss function with an
sgd
variant, in conjunction with a decaying learning rate, until
→