BriefGPT.xyz
Sep, 2019
Megatron-LM:使用模型并行训练十亿级语言模型
Megatron-LM: Training Multi-Billion Parameter Language Models Using GPU Model Parallelism
HTML
PDF
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper...
TL;DR
本文提出了一种用于训练亿级参数的transformer模型的简便、高效的内部层模型并行方法,并且通过在WikiText103,LAMBADA和RACE数据集中取得了最先进的结果,证明了大型语言模型可以进一步推进最先进的技术。
Abstract
Recent work in unsupervised
language modeling
demonstrates that training large neural language models advances the state of the art in Natural Language Processing applications. However, for very
large models
, mem
→