David R. So, Wojciech Mańke, Hanxiao Liu, Zihang Dai, Noam Shazeer...
TL;DR通过搜索一种更高效的变体,即 Primer,我们旨在降低 Transformer 模型的训练和推理成本,并且我们证实 Primer 可以在不添加额外调整的情况下显著加快训练速度。
Abstract
Large transformer models have been central to recent advances in natural language processing. The training and inference costs of these models, however, have grown rapidly and become prohibitively expensive. Here