BriefGPT.xyz
Sep, 2021
高效扩展:基于预训练和微调的Transformer技术洞见
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
HTML
PDF
Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar...
TL;DR
本文提出了来自于预训练和微调Transformer的缩放见解,并展示了改进的缩放协议,使得重新设计的模型在downstream fine-tuning方面具有相似的质量,同时参数数量减少了50%,训练速度比广泛采用的T5-base模型快了40%。
Abstract
There remain many open questions pertaining to the
scaling
behaviour of
transformer architectures
. These
scaling
decisions and findings ca
→