Large language models (LLMs) has experienced exponential growth, they demonstrate remarkable performance across various tasks. Notwithstanding, contemporary research primarily centers on enhancing the size and quality of pretraining data, still utilizing the next token prediction task on autoregressive transformer model structure. The efficacy of this task in truly facilitating the model's comprehension of code logic remains questionable, we speculate that it still interprets code as mere text, while human emphasizes the underlying logical knowledge. In order to prove it, we introduce a new task, "Logically Equivalent Code Selection," which necessitates the selection of logically equivalent code from a candidate set, given a query code. Our experimental findings indicate that current LLMs underperform in this task, since they understand code by unordered bag of keywords. To ameliorate their performance, we propose an advanced pretraining task, "Next Token Prediction+". This task aims to modify the sentence embedding distribution of the LLM without sacrificing its generative capabilities. Our experimental results reveal that following this pretraining, both Code Llama and StarCoder, the prevalent code domain pretraining models, display significant improvements on our logically equivalent code selection task and the code completion task.

大型语言模型的研究着重于提升预训练数据的规模和质量，而目前对于其真正理解代码逻辑的任务效果仍然存在疑问。本文提出了一种新的任务，即“逻辑等效代码选择”，证明了当前的大型语言模型在这一任务中表现不佳，并提出了预训练任务“下一个标记预测+”来改善其性能，实验证明该方法对于逻辑等效代码的选择和代码补全任务有显著的改进。

GPT：下一个Token预测是否足够？对代码逻辑理解的探索