Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson
TL;DR稠密线性层、结构化矩阵、初始化尺度、学习速率和计算高效模型是这篇论文的关键。
Abstract
dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as