Yuan Tian, Klaus-Rudolf Kladny, Qin Wang, Zhiwu Huang, Olga Fink
TL;DR本文提出了一种名为 Time Dynamical Opponent Model 的新型多智能体强化学习方法,提高了智能体在协助和竞争环境中的管用性。
Abstract
In multi-agent reinforcement learning, multiple agents learn simultaneously
while interacting with a common environment and each other. Since the agents
adapt their policies during learning, not only the behavior of a single agent
becomes non-stationary, but also the environment as per