We present DPIQN, a deep policy inference Q-network that targets multi-agent
systems composed of controllable agents, collaborators, and opponents that
interact with each other. We focus on one challenging issue in such
systems---modeling agents with varying strategies---and propose to employ
"policy features" learned from raw observations (e.g., raw images) of
collaborators and opponents by inferring their policies. DPIQN incorporates the
learned policy features as a hidden vector into its own deep Q-network (DQN),
such that it is able to predict better Q values for the controllable agents
than the state-of-the-art deep reinforcement learning models. We further
propose an enhanced version of DPIQN, called deep recurrent policy inference
Q-network (DRPIQN), for handling partial observability. Both DPIQN and DRPIQN
are trained by an adaptive training procedure, which adjusts the network's
attention to learn the policy features and its own Q-values at different phases
of the training process. We present a comprehensive analysis of DPIQN and
DRPIQN, and highlight their effectiveness and generalizability in various
multi-agent settings. Our models are evaluated in a classic soccer game
involving both competitive and collaborative scenarios. Experimental results
performed on 1 vs. 1 and 2 vs. 2 games show that DPIQN and DRPIQN demonstrate
superior performance to the baseline DQN and deep recurrent Q-network (DRQN)
models. We also explore scenarios in which collaborators or opponents
dynamically change their policies, and show that DPIQN and DRPIQN do lead to
better overall performance in terms of stability and mean scores.

本文介绍了 DPIQN 和 DRPIQN，这两个深度增强学习网络通过使用从协作者和对手的原始观察中推断出的策略特征来改进对可控制代理的 Q 值预测，适用于具有不同策略的协作者、对手和可控制代理的多智能体系统中。作者通过 1 对 1 和 2 对 2 的经典足球游戏等多种多智能体场景中的实验证明了这两个模型的高性能。