Recent studies show that self-feedback improves large language models (LLMs) on certain tasks while worsens other tasks. We discovered that such a contrary is due to LLM's bias towards their own output. In this paper, we formally define LLM's self-bias -- the tendency to favor its own generation -- using two statistics. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks.

最近的研究表明，自我反馈可以改善大型语言模型在某些任务上的表现，但对其他任务而言则会恶化。我们发现这种矛盾是由于语言模型对自己的输出存在偏见所致。本文通过两个统计量正式定义了语言模型的自我偏见——偏爱其自身生成的内容。我们分析了六个语言模型在翻译、受限文本生成和数学推理任务上的表现。我们发现自我偏见在所有研究的语言模型中普遍存在，并且跨多种语言和任务。我们的分析揭示了自我优化流程虽然可以提高模型输出的流畅度和可理解性，但会进一步放大自我偏见。为了减轻这种偏见，我们发现更大的模型规模和准确评估的外部反馈可以显著减少自我优化流程中的偏见，从而在下游任务中实现实际性能的提升。

自我反馈的危险：自我偏见在大型语言模型中增强