Sequential reasoning is a complex human ability, with extensive previous research focusing on gaming AI in a single continuous game, round-based decision makings extending to a sequence of games remain less explored. Counter-Strike: Global Offensive (CS:GO), as a round-based game with abundant expert demonstrations, provides an excellent environment for multi-player round-based sequential reasoning. In this work, we propose a Sequence Reasoner with Round Attribute Encoder and Multi-Task Decoder to interpret the strategies behind the round-based purchasing decisions. We adopt few-shot learning to sample multiple rounds in a match, and modified model agnostic meta-learning algorithm Reptile for the meta-learning loop. We formulate each round as a multi-task sequence generation problem. Our state representations combine action encoder, team encoder, player features, round attribute encoder, and economy encoders to help our agent learn to reason under this specific multi-player round-based scenario. A complete ablation study and comparison with the greedy approach certify the effectiveness of our model. Our research will open doors for interpretable AI for understanding episodic and long-term purchasing strategies beyond the gaming community.

本研究采用多任务学习的方法，通过对轮次属性编码和多任务解码，理解轮次购买决策背后的策略。在Counter-Strike: Global Offensive游戏中，我们将每一轮作为一个多任务序列生成问题，将动作编码器、团队编码器、玩家特征、轮次属性编码器和经济编码器结合在一起，以帮助我们的代理人学习在这个特定的多人轮次场景中进行推理的能力。实验表明，相比贪心法，我们所提出的模型更加有效，并为理解游戏策略提供了可解释的人工智能方法。

基于回合游戏的推理学习：第一人称射击游戏中购买决策的多任务序列生成