Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.

本文提出了Simplex-NeuPL算法，通过基础策略的单个条件网络来学习代表策略上的多样性，同时学习最佳响应。实验结果表明，该算法能够有效地处理不确定性，并在测试时提供更好的表现。此外，学习任意混合策略的最佳响应是一种有效的战略探索辅助任务，可以提高性能。

简单形神经元群体学习：在对称零和博弈中的任意混合贝叶斯最优性