Jan, 2022
ScaLA: 通过高效的大批量对抗性噪声加速预训练的基于 Transformer 的语言模型的适应性
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Minjia Zhang, Niranjan Uma Naresh, Yuxiong He
TL;DR通过加入轻量级对抗噪声到大规模优化中,我们提出了 ScaLA 方法,可以加速预训练 transformer 网络的自适应速度,并在保持模型概括能力的同时,取得了与最先进的大批量优化方法相当甚至更高的准确性。