BriefGPT.xyz
Oct, 2023
Monarch Mixer:一个简单的次线性GEMM架构
Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture
HTML
PDF
Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu...
TL;DR
机器学习模型通过使用Monarch Mixer(M2)架构,实现了在序列长度和模型维度上的次二次扩展,以达到更长的上下文和更好的性能,并且在非因果伯特模型、ViT图像分类和因果GPT模型三个领域展示了良好的性能。
Abstract
machine learning models
are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better
performance
. However, existing architectures such as Transformers scale quadra
→