BriefGPT.xyz
Jun, 2024
代码生成评估的基准和指标:一项关键性回顾
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
HTML
PDF
Debalina Ghosh Paul, Hong Zhu, Ian Bayley
TL;DR
对大型语言模型在编程任务中的评估工作进行了关键综述,着重讨论了现有工具的评估中使用的基准和度量标准,并提出了进一步研究的方向。
Abstract
With the rapid development of Large Language Models (LLMs), a large number of
machine learning models
have been developed to assist
programming tasks
including the generation of program code from natural language
→