BriefGPT.xyz
Oct, 2020
多智能体信任区域策略优化
Multi-Agent Trust Region Policy Optimization
HTML
PDF
Hepeng Li, Haibo He
TL;DR
该研究将信任区域策略优化(TRPO)扩展到多智能体强化学习(MARL)问题,提出了一种基于分布式共识优化问题的去中心化MARL算法MATRPO,该算法能够基于本地观察和私人奖励优化分布式策略,实现完全的去中心化和保护隐私。实验表明,MATRPO在复杂的MARL任务中表现出了强韧的性能。
Abstract
We extend
trust region policy optimization
(TRPO) to
multi-agent reinforcement learning
(MARL) problems. We show that the policy update of TRPO can be transformed into a
→