Strategic social deduction games serve as valuable testbeds for evaluating the understanding and inference skills of language models, offering crucial insights into social science, artificial intelligence, and strategic gaming. This paper focuses on creating proxies of human behavior in simulated environments, with Among Us utilized as a tool for studying simulated human behavior. The study introduces a text-based game environment, named AmongAgents, that mirrors the dynamics of Among Us. Players act as crew members aboard a spaceship, tasked with identifying impostors who are sabotaging the ship and eliminating the crew. Within this environment, the behavior of simulated language agents is analyzed. The experiments involve diverse game sequences featuring different configurations of Crewmates and Impostor personality archetypes. Our work demonstrates that state-of-the-art large language models (LLMs) can effectively grasp the game rules and make decisions based on the current context. This work aims to promote further exploration of LLMs in goal-oriented games with incomplete information and complex action spaces, as these settings offer valuable opportunities to assess language model performance in socially driven scenarios.

本研究解决了评估语言模型在社交推理游戏中的理解和推理能力的缺口，采用名为AmongAgents的文本游戏环境模拟人类行为。研究发现，最先进的大型语言模型能够有效理解游戏规则，并根据当前情境做出决策，这为未来在复杂行动空间中评估语言模型性能提供了新方向。

AMONGAGENTS：评估大型语言模型在互动文本社交推理游戏中的表现