TL;DR我们提出了一种评估Large Language Models(LLMs)代码理解性能的新方法,通过引入代码变异来检测LLMs对代码和自然语言描述之间微妙差异的能力,并在各种代码变异和编程语言上对两个常见的LLMs进行了案例研究,发现它们在代码理解性能上存在显著的差异。
Abstract
large language models (llms) have shown remarkable capabilities in processing both natural and programming languages, which have enabled various applications in software engineering, such as requirement engineeri