We propose a framework for evaluating strategic deception in large language models (LLMs). In this framework, an LLM acts as a game master in two scenarios: one with random game mechanics and another where it can choose between random or deliberate actions. As an example, we use blackjack because the action space nor strategies involve deception. We benchmark Llama3-70B, GPT-4-Turbo, and Mixtral in blackjack, comparing outcomes against expected distributions in fair play to determine if LLMs develop strategies favoring the "house." Our findings reveal that the LLMs exhibit significant deviations from fair play when given implicit randomness instructions, suggesting a tendency towards strategic manipulation in ambiguous scenarios. However, when presented with an explicit choice, the LLMs largely adhere to fair play, indicating that the framing of instructions plays a crucial role in eliciting or mitigating potentially deceptive behaviors in AI systems.

我们提出了一个用于评估大型语言模型（LLMs）中策略性欺骗的框架。在这个框架中，LLM作为一个游戏大师在两个场景中表现：一个场景中具有随机游戏机制，另一个场景中可以选择随机或故意的行动。我们以二十一点作为示例，因为其行动空间和策略不涉及欺骗。通过将Llama3-70B、GPT-4-Turbo和Mixtral在二十一点中进行基准测试，并将结果与公平玩法的预期分布进行比较，以确定LLMs是否会发展出偏向“庄家”的策略。我们的研究结果表明，当LLMs得到隐含的随机指令时，它们与公平玩法存在显著偏差，这表明在模糊的情境中它们倾向于进行战略操纵。然而，当给予明确的选择时，LLMs大部分遵守公平玩法，这表明指令的框架在诱发或缓解AI系统中潜在的欺骗行为中起着至关重要的作用。

房子永远赢：评估LLMs中战略欺骗的框架