BriefGPT.xyz
May, 2024
通过稀疏上下文选择加速检索辅助生成的推理
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
HTML
PDF
Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu...
TL;DR
通过引入稀疏技术,Sparse RAG提出了一种新颖的范式,在提高生成质量的同时减少计算成本,通过并行编码检索文档并选择性地解码输出,既降低了延迟,又提升了模型的焦点和生成质量。
Abstract
large language models
(LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of
retrieved document
→