Large Language Models (LLMs) embed complex biases and stereotypes that can lead to detrimental user experiences and societal consequences, often without conscious awareness from the models themselves. This paper emphasizes the importance of equipping LLMs with mechanisms for better self-reflection and bias recognition. Our experiments demonstrate that by informing LLMs that their generated content does not represent their own views and questioning them about bias, their capability to identify and address biases improves. This enhancement is attributed to the internal attention mechanisms and potential internal sensitivity policies of LLMs. Building upon these findings, we propose a novel method to diminish bias in LLM outputs. This involves engaging LLMs in multi-role scenarios acting as different roles where they are tasked for bias exposure, with a role of an impartial referee in the end of each loop of debate. A ranking scoring mechanism is employed to quantify bias levels, enabling more refined reflections and superior output quality. Comparative experimental results confirm that our method outperforms existing approaches in reducing bias, making it a valuable contribution to efforts towards more ethical AI systems.

大型语言模型（LLMs）嵌入了复杂的偏见和刻板印象，可能导致有害的用户体验和社会后果，而模型本身通常没有意识到这一点。本文强调了为LLMs配备更好的自我反思和偏见识别机制的重要性。我们的实验表明，通过告知LLMs它们生成的内容不代表自己的观点，并对其偏见进行质疑，可以提高LLMs识别和解决偏见的能力。这种改进归因于LLMs的内部注意力机制和潜在的内部敏感性政策。基于这些发现，我们提出了一个减少LLMs输出偏见的新方法。该方法涉及将LLMs置于多角色情景中，扮演不同角色，在每个辩论循环的最后担任公正裁判的角色，以暴露偏见。采用排名评分机制来量化偏见水平，从而实现更精细的反思和更优质的输出。比较实验结果证实我们的方法在减少偏见方面优于现有方法，为追求更具伦理AI系统的努力作出了有价值的贡献。

欺骗以启蒙：诱导LLMs自省以增强偏见检测和缓解