Implicit bias refers to automatic or spontaneous mental processes that shape perceptions, judgments, and behaviors. Previous research examining `implicit bias' in large language models (LLMs) has often approached the phenomenon differently than how it is studied in humans by focusing primarily on model outputs rather than on model processing. To examine model processing, we present a method called the Reasoning Model Implicit Association Test (RM-IAT) for studying implicit bias-like patterns in reasoning models: LLMs that employ step-by-step reasoning to solve complex tasks. Using this method, we find that reasoning models require more tokens when processing association-incompatible information compared to association-compatible information. These findings suggest AI systems harbor patterns in processing information that are analogous to human implicit bias. We consider the implications of these implicit bias-like patterns for their deployment in real-world applications.

本研究解决了理解大型语言模型（LLMs）中隐含偏见的处理方式的不足。提出了一种新的方法——推理模型隐含联想测试（RM-IAT），用于研究推理模型中的隐含偏见类模式。研究发现，推理模型在处理信息时，对于不相容的关联信息需要更多的标记，这表明AI系统在信息处理上存在类似于人类隐含偏见的模式，指出了其在现实应用中的潜在影响。