In Retrieval-Augmented Generation (RAG) tasks using Large Language Models (LLMs), the quality of retrieved information is critical to the final output. This paper introduces the IRSC benchmark for evaluating the performance of embedding models in multilingual RAG tasks. The benchmark encompasses five retrieval tasks: query retrieval, title retrieval, part-of-paragraph retrieval, keyword retrieval, and summary retrieval. Our research addresses the current lack of comprehensive testing and effective comparison methods for embedding models in RAG scenarios. We introduced new metrics: the Similarity of Semantic Comprehension Index (SSCI) and the Retrieval Capability Contest Index (RCCI), and evaluated models such as Snowflake-Arctic, BGE, GTE, and M3E. Our contributions include: 1) the IRSC benchmark, 2) the SSCI and RCCI metrics, and 3) insights into the cross-lingual limitations of embedding models. The IRSC benchmark aims to enhance the understanding and development of accurate retrieval systems in RAG tasks. All code and datasets are available at: https://github.com/Jasaxion/IRSC_Benchmark

本研究旨在解决在检索增强生成任务中，当前缺乏全面测试和有效比较方法的问题。我们提出了IRSC基准评估多语言RAG任务中的嵌入模型性能，并引入了新的评估指标：语义理解相似度指数（SSCI）和检索能力竞赛指数（RCCI），为提升检索系统的准确性提供了重要的见解和工具。

IRSC：用于信息检索的零-shot评估基准，通过语义理解在增强生成场景中应用