Despite the substantial success of Information Retrieval (IR) in various NLP tasks, most IR systems predominantly handle queries and corpora in natural language, neglecting the domain of code retrieval. Code retrieval is critically important yet remains under-explored, with existing methods and benchmarks inadequately representing the diversity of code in various domains and tasks. Addressing this gap, we present \textbf{\name} (\textbf{Co}de \textbf{I}nformation \textbf{R}etrieval Benchmark), a robust and comprehensive benchmark specifically designed to assess code retrieval capabilities. \name comprises \textbf{ten} meticulously curated code datasets, spanning \textbf{eight} distinctive retrieval tasks across \textbf{seven} diverse domains. We first discuss the construction of \name and its diverse dataset composition. Further, we evaluate nine widely used retrieval models using \name, uncovering significant difficulties in performing code retrieval tasks even with state-of-the-art systems. To facilitate easy adoption and integration within existing research workflows, \name has been developed as a user-friendly Python framework, readily installable via pip. It shares same data schema as other popular benchmarks like MTEB and BEIR, enabling seamless cross-benchmark evaluations. Through \name, we aim to invigorate research in the code retrieval domain, providing a versatile benchmarking tool that encourages further development and exploration of code retrieval systems\footnote{\url{ https://github.com/CoIR-team/coir}}.

通过设计了
ame（	extbf{Co}	extbf{de} 	extbf{I}	extbf{n}	extbf{formation} 	extbf{R}	extbf{etrieval} 	extbf{B}enchmark）这一强大而全面的基准测试系统，我们针对代码检索的需求进行了深入研究，并评估了九个广泛使用的检索模型，发现即使使用最先进的系统，进行代码检索任务仍然存在重大困难。为了方便与现有研究工作流程的整合和采用，
ame已经开发成一个用户友好的Python框架，并可通过pip进行快速安装。它与MTEB和BEIR等其他流行的基准测试系统共享相同的数据模式，实现了无缝的跨基准测试评估。通过
ame，我们旨在推动代码检索领域的研究，提供一种多功能的基准测试工具，鼓励进一步开发和探索代码检索系统。

CoIR：代码信息检索模型的综合基准