BriefGPT.xyz
Jan, 2019
增加你的批量:更大批量训练有更好效果
Augment your batch: better training with larger batches
HTML
PDF
Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler...
TL;DR
本文提出一种基于批增广的优化算法,可以应用于深度学习的大批量 SGD 训练中,减少了必要的 SGD 更新数量,提高了训练速度和泛化能力。
Abstract
large-batch sgd
is important for scaling training of
deep neural networks
. However, without fine-tuning hyperparameter schedules, the
generalizat
→