Utilizing state-of-the-art Large Language Models (LLMs), automatic code generation models play a pivotal role in enhancing the productivity and efficiency of software development coding procedures. As the adoption of LLMs becomes more widespread in software coding ecosystems, a pressing issue has emerged: does the generated code contain social biases, such as those related to age, gender, and race? This issue concerns the integrity, fairness, and ethical foundation of software applications that depend on the code generated by these models, yet is under-explored in the literature. This paper presents a novel bias assessment framework that is specifically designed for code generation tasks. Based on this framework, we conduct an extensive evaluation on the bias of nine state-of-the-art LLM-based code generation models. Our findings reveal that first, 31.45\% to 79.93\% code functions generated by our evaluated code generation models are biased, and 9.68\% to 37.37\% code functions' functionality are affected by the bias, which means biases not only exist in code generation models but in some cases, directly affect the functionality of the generated code, posing risks of unintended and possibly harmful software behaviors. To mitigate bias from code generation models, we propose three mitigation strategies, which can decrease the biased code ratio to a very low level of 0.4\% to 4.57\%.

基于最新的大型语言模型 (LLMs) ，本研究提出了一个面向代码生成任务的新型偏差评估框架，并对九种最先进的 LLM-based 代码生成模型进行了广泛评估。研究发现，我们评估的代码生成模型中有 31.45% 到 79.93% 的代码函数存在偏差，其中 9.68% 到 37.37% 的代码函数受到偏差的影响，这意味着不仅代码生成模型存在偏差，而且在某些情况下，偏差直接影响生成代码的功能，存在无意和可能有害的软件行为风险。为了减轻代码生成模型的偏差，我们提出了三种缓解策略，可以将偏差代码比例降低到非常低的水平，即 0.4% 到 4.57%。

基于LLM的代码生成中的偏见评估与缓解