Wei Qiu, Weixun Wang, Rundong Wang, Bo An, Yujing Hu...
TL;DR提出一种应对 The off-beat actions 下模型自由 MARL 算法的算法框架,并通过一种新的时序奖励重分配方案,利用 LeGEM 建立代理的情节性记忆以提高多智能体协调。结果表明,该算法显着提升了多智能体协调并提高了样本效率。
Abstract
We investigate model-free multi-agent reinforcement learning (MARL) in
environments where off-beat actions are prevalent, i.e., all actions have
pre-set execution durations. During execution durations, the enviro