BriefGPT.xyz
Nov, 2022
TorchScale:大规模应用的Transformers
TorchScale: Transformers at Scale
HTML
PDF
Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi...
TL;DR
本论文介绍了TorchScale工具库,它通过采用多种建模技术来提高建模的能力、训练的稳定性和效率,进而成功应用于语言建模和神经机器翻译的实验,并可有效扩大Transformers的规模。
Abstract
Large
transformers
have achieved state-of-the-art performance across many tasks. Most open-source libraries on scaling
transformers
focus on improving training or inference with better parallelization. In this wo
→