BriefGPT.xyz
Jun, 2024
RepoQA:评估长上下文的代码理解
RepoQA: Evaluating Long Context Code Understanding
HTML
PDF
Jiawei Liu, Jia Le Tian, Vijay Daita, Yuxiang Wei, Yifeng Ding...
TL;DR
RepoQA是一个多语言且综合性的基准测试,评估了LLMs在长上下文代码理解上的能力,并展示了开源和专有模型之间仍存在着一小段差距,不同模型在不同编程语言上具有良好的表现,而没有注释的代码可能会更好地被模型理解。
Abstract
Recent advances have been improving the
context windows
of
large language models
(LLMs). To quantify the real long-context capabilities of LLMs, evaluators such as the popular Needle in a Haystack have been devel
→