Joonhyung Lee, Jeongin Bae, Byeongwook Kim, Se Jung Kwon, Dongsoo Lee
TL;DR降低精度的浮点表示在大型语言模型(LLM)训练中的稳定性及经济性的调查和分析。
Abstract
The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the <