Large Language Models (LLMs) embed complex biases and stereotypes that can
lead to detrimental user experiences and societal consequences, often without
conscious awareness from the models themselves. This paper emphasizes the
importance of equipping LLMs with mechanisms for better self-reflection and
bias recognition. Our experiments demonstrate that by informing LLMs that their
generated content does not represent their own views and questioning them about
bias, their capability to identify and address biases improves. This
enhancement is attributed to the internal attention mechanisms and potential
internal sensitivity policies of LLMs. Building upon these findings, we propose
a novel method to diminish bias in LLM outputs. This involves engaging LLMs in
multi-role scenarios acting as different roles where they are tasked for bias
exposure, with a role of an impartial referee in the end of each loop of
debate. A ranking scoring mechanism is employed to quantify bias levels,
enabling more refined reflections and superior output quality. Comparative
experimental results confirm that our method outperforms existing approaches in
reducing bias, making it a valuable contribution to efforts towards more
ethical AI systems.

大型语言模型（LLMs）嵌入了复杂的偏见和刻板印象，可能导致有害的用户体验和社会后果，而模型本身通常没有意识到这一点。本文强调了为 LLMs 配备更好的自我反思和偏见识别机制的重要性。我们的实验表明，通过告知 LLMs 它们生成的内容不代表自己的观点，并对其偏见进行质疑，可以提高 LLMs 识别和解决偏见的能力。这种改进归因于 LLMs 的内部注意力机制和潜在的内部敏感性政策。基于这些发现，我们提出了一个减少 LLMs 输出偏见的新方法。该方法涉及将 LLMs 置于多角色情景中，扮演不同角色，在每个辩论循环的最后担任公正裁判的角色，以暴露偏见。采用排名评分机制来量化偏见水平，从而实现更精细的反思和更优质的输出。比较实验结果证实我们的方法在减少偏见方面优于现有方法，为追求更具伦理 AI 系统的努力作出了有价值的贡献。