BriefGPT.xyz
Jan, 2024
BiTA: 大型语言模型的无损加速的双向调整
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
HTML
PDF
Feng Lin, Hanling Yi, Hongbin Li, Yifan Yang, Xiaotian Yu...
TL;DR
利用半自回归生成和草案验证的简化流程,提出的双向调整方法(BiTA)可加速大型语言模型(LLMs),使推理效率得到显著提高。
Abstract
large language models
(LLMs) commonly employ
autoregressive generation
during inference, leading to high memory bandwidth demand and consequently extended latency. To mitigate this inefficiency, we present
→