Drawing parallels between human cognition and artificial intelligence, we explored how large language models (LLMs) internalize identities imposed by targeted prompts. Informed by Social Identity Theory, these identity assignments lead LLMs to distinguish between "we" (the ingroup) and "they" (the outgroup). This self-categorization generates both ingroup favoritism and outgroup bias. Nonetheless, existing literature has predominantly focused on ingroup favoritism, often overlooking outgroup bias, which is a fundamental source of intergroup prejudice and discrimination. Our experiment addresses this gap by demonstrating that outgroup bias manifests as strongly as ingroup favoritism. Furthermore, we successfully mitigated the inherent pro-liberal, anti-conservative bias in LLMs by guiding them to adopt the perspectives of the initially disfavored group. These results were replicated in the context of gender bias. Our findings highlight the potential to develop more equitable and balanced language models.

本研究解决了大语言模型（LLMs）在身份认同的内化过程中，对外群体偏见的忽视问题。通过社会身份理论，我们展示了外群体偏见与内群体偏见同样强烈，并通过引导语言模型采用被先前不利群体的视角，有效减轻了其固有的偏见。这一发现对开发更加公正平衡的语言模型具有重要意义。

角色设定陷阱：大语言模型中的持续外群偏见源于社会身份认同