Retrieval-Augmented Generation (RAG) systems represent a significant advancement over traditional Large Language Models (LLMs). RAG systems enhance their generation ability by incorporating external data retrieved through an Information Retrieval (IR) phase, overcoming the limitations of standard LLMs, which are restricted to their pre-trained knowledge and limited context window. Most research in this area has predominantly concentrated on the generative aspect of LLMs within RAG systems. Our study fills this gap by thoroughly and critically analyzing the influence of IR components on RAG systems. This paper analyzes which characteristics a retriever should possess for an effective RAG's prompt formulation, focusing on the type of documents that should be retrieved. We evaluate various elements, such as the relevance of the documents to the prompt, their position, and the number included in the context. Our findings reveal, among other insights, that including irrelevant documents can unexpectedly enhance performance by more than 30% in accuracy, contradicting our initial assumption of diminished quality. These findings call for developing specialized approaches tailored to the specific demands of integrating retrieval with language generation models and pave the way for future research. These results underscore the need for developing specialized strategies to integrate retrieval with language generation models, thereby laying the groundwork for future research in this field.

通过详细和批判性地分析信息检索组件对检索辅助生成系统的影响，本研究揭示了特定类型的文档对于有效的检索辅助生成提示公式所需的特征，发现包括不相关的文档能出乎意料地提高准确性超过30％。这些结果强调了需要开发专门策略来将检索与语言生成模型相结合，为未来研究奠定了基础。

噪声的力量：为RAG系统重新定义检索