BriefGPT.xyz
Oct, 2023
在线投机解码
Online Speculative Decoding
HTML
PDF
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng...
TL;DR
通过在线推理和训练预估模型,我们提出了一种在线推理预估解码技术,用于加速大型语言模型的推理过程,并减少延迟。
Abstract
speculative decoding
is a pivotal technique to accelerate the inference of
large language models
(LLMs) by employing a smaller
draft model
→