BriefGPT.xyz
Jan, 2024
DevEval: 评估实际软件项目中的代码生成
DevEval: Evaluating Code Generation in Practical Software Projects
HTML
PDF
Jia Li, Ge Li, Yunfei Zhao, Yongmin Li, Zhi Jin...
TL;DR
通过提出一个与开发者在实践项目中的经验相一致的新基准DevEval,我们评估了五个热门的大型语言模型在代码生成方面的实际能力,揭示了它们的实际表现,并讨论了在实践项目中代码生成的挑战和未来发展方向。
Abstract
How to evaluate
large language models
(LLMs) in
code generation
is an open question. Many benchmarks have been proposed but are inconsistent with practical software projects, e.g., unreal program distributions, i
→