Groups of humans are often able to find ways to cooperate with one another in
complex, temporally extended social dilemmas. Models based on behavioral
economics are only able to explain this phenomenon for unrealistic stateless
matrix games. Recently, multi-agent reinforcement learning
本研究提出了一种通用的在线强化学习算法,该算法能够向其合作伙伴表现出互惠行为,并在与自私代理一起学习时能够诱发更广泛的群体互惠行为,无论是在 $2$-player Markov game 还是 $5$-player intertemporal social dilemmas 中。研究分析表明,实施互惠行为的代理受到其合作伙伴行为的强烈影响。