BriefGPT.xyz
Mar, 2024
评估语言模型代码生成能力时的污染量量化
Quantifying Contamination in Evaluating Code Generation Capabilities of Language Models
HTML
PDF
Martin Riddell, Ansong Ni, Arman Cohan
TL;DR
该研究综合研究了大型语言模型在代码生成任务中的数据污染问题,分析了常见代码生成基准测试与预训练语料之间的重叠程度,并揭示了类似训练解决方案出现时模型性能显著提高的现象,同时分析了模型大小、问题难度和问题长度等因素对模型记忆和泛化的影响。
Abstract
While
large language models
have achieved remarkable performance on various
code generation
benchmarks, there have been growing concerns regarding potential contamination of these benchmarks as they may be leaked
→