Deep reinforcement learning (DRL) has gained a lot of attention in recent years, and has been proven to be able to play Atari games and Go at or above human levels. However, those games are assumed to have a small fixed number of actions and could be trained with a simple CNN network. In this paper, we study a special class of Asian popular card games called Dou Di Zhu, in which two adversarial groups of agents must consider numerous card combinations at each time step, leading to huge number of actions. We propose a novel method to handle combinatorial actions, which we call combinational Q-learning (CQL). We employ a two-stage network to reduce action space and also leverage order-invariant max-pooling operations to extract relationships between primitive actions. Results show that our method prevails over state-of-the art methods like naive Q-learning and A3C. We develop an easy-to-use card game environments and train all agents adversarially from sractch, with only knowledge of game rules and verify that our agents are comparative to humans. Our code to reproduce all reported results will be available online.

本文研究了一类特殊的亚洲纸牌游戏斗地主，针对其巨大的行动空间提出了一种名为组合Q学习的新方法，利用两阶段网络和池化操作提取基本行动之间的关系，结果表明比传统的Q学习和A3C等算法更优，并使用对抗训练方法仅凭游戏规则训练出了可与人类媲美的代理。

基于组合的Q学习算法在斗地主中的应用