BriefGPT.xyz
Nov, 2021
Mesa:用于Transformer的节省内存的训练框架
Mesa: A Memory-saving Training Framework for Transformers
HTML
PDF
Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai...
TL;DR
Mesa是一种针对Transformers网络进行训练的内存优化框架,采用精确激活函数和低精度激活函数相结合的方式进行训练,结合头部激活函数的统计学信息对激活函数进行量化,通过对估计运行量的量化进行参数学习以提高训练效率,可在保证计算资源有限的情况下达到优异的性能表现。
Abstract
There has been an explosion of interest in designing high-performance
transformers
. While
transformers
have delivered significant performance improvements, training such networks is extremely memory intensive owi
→