BriefGPT.xyz
Feb, 2022
无参数落下:基于敏感度导向的自适应学习速率训练大型Transformer模型
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
HTML
PDF
Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu...
TL;DR
通过提出一种新的训练策略,根据每个参数的敏感度自适应调整学习率,以减少冗余并改善泛化性能。该训练方法在自然语言理解、神经机器翻译和图像分类方面取得了显著的有效性。
Abstract
Recent research has shown the existence of significant redundancy in large
transformer models
. One can prune the redundant parameters without significantly sacrificing the
generalization performance
. However, we
→