Retrieval-augmented generation (RAG) greatly benefits language models (LMs) by providing additional context for tasks such as document-based question answering (DBQA). Despite its potential, the power of RAG is highly dependent on its configuration, raising the question: What is the optimal RAG configuration? To answer this, we introduce the RAGGED framework to analyze and optimize RAG systems. On a set of representative DBQA tasks, we study two classic sparse and dense retrievers, and four top-performing LMs in encoder-decoder and decoder-only architectures. Through RAGGED, we uncover that different models suit substantially varied RAG setups. While encoder-decoder models monotonically improve with more documents, we find decoder-only models can only effectively use < 5 documents, despite often having a longer context window. RAGGED offers further insights into LMs' context utilization habits, where we find that encoder-decoder models rely more on contexts and are thus more sensitive to retrieval quality, while decoder-only models tend to rely on knowledge memorized during training.

检索增强生成（RAG）通过为文档问答等任务提供额外的上下文大大提升了语言模型（LMs）的能力。在研究框架RAGGED下，我们研究了代表性的文档问答任务，观察了两种经典的稀疏和密集检索器以及四种在编码器-解码器和仅解码器架构中表现优异的LMs。研究结果显示，不同的模型适用于不同的RAG配置，而编码器-解码器模型在使用更多文档时呈现出单调改进，而仅解码器模型只能有效使用小于5个文档，尽管其上下文窗口通常更长。此外，RAGGED还揭示了LMs的上下文利用习惯，其中编码器-解码器模型更依赖上下文，并且对于检索质量更为敏感，而仅解码器模型则更倾向于依赖训练中记忆的知识。

RAGGED: 面向具备信息的检索增强生成系统的设计