BriefGPT.xyz
May, 2024
DynaMo: 动态多词采样加速语言模型推断
DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling
HTML
PDF
Shikhar Tuli, Chi-Heng Lin, Yen-Chang Hsu, Niraj K. Jha, Yilin Shen...
TL;DR
DynaMo是一个多令牌预测语言模型套件,通过在预测联合概率分布的基础上动态预测多个令牌来减少净推理时间,实现了与基准(Pythia-6.9B)相同质量的生成文本,并获得2.57倍的加速,仅有5.87%和2.67%的参数和训练时间开销。
Abstract
Traditional
language models
operate autoregressively, i.e., they predict one token at a time. Rapid explosion in model sizes has resulted in high
inference times
. In this work, we propose DynaMo, a suite of
→