BriefGPT.xyz
May, 2024
DevEval:与现实世界源代码仓库对齐的手动注释代码生成基准
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
HTML
PDF
Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Huanyu Liu...
TL;DR
通过新的基准测试DevEval,我们评估了8种流行的大型语言模型在真实代码库中的编码能力,并发现这些模型的编码能力在真实世界的代码库中存在缺陷。
Abstract
How to evaluate the
coding abilities
of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with
real-world code repositories
and are insufficient to evaluat
→