Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes...
TL;DR本文提出了在多个智能体环境中,为每个RL 智能体提供直接向其它智能体给予奖励的能力,并通过学习后的激励函数影响其它智能体,从而达到协作的目的。实验结果显示,在 challenging general-sum Markov games 中,相对于标准RL和对手建模代理,这种方法在寻找最优的分工方面取得了巨大的成功。
Abstract
The challenge of developing powerful and general reinforcement learning (RL) agents has received increasing attention in recent years. Much of this effort has focused on the single-agent setting, in which an agent maximizes a predefined extrinsic reward function. However, a long-term q