TL;DR本论文展示了使用AWS Trainium和Neuron Distributed Training Library成功预训练了HLAT等具有高性能和高效性的最新大型语言模型。
Abstract
Getting large language models (LLMs) to perform well on the downstream tasks requires pre-training over trillions of tokens. This typically demands a large number of powerful computational devices in addition to a stable distributed training framework to accelerate the training. The gr