BriefGPT.xyz
Nov, 2021
协作式近端策略优化
Coordinated Proximal Policy Optimization
HTML
PDF
Zifan Wu, Chao Yu, Deheng Ye, Junge Zhang, Haiyin Piao...
TL;DR
本文提出了一种名为CoPPO的算法,用于多智能体环境下的多项策略优化,并证明了该算法在优化理论基础上的联合目标后能够实现动态的学分分配,解决了多智能体系统中同时更新智能体策略时高方差的问题,并通过实验证明其在合作矩阵博弈和StarCraft II微观管理任务等典型多智能体环境下优于一些强基线,并与最新的多智能体PPO方法(即MAPPO)相竞争。
Abstract
We present
coordinated proximal policy optimization
(CoPPO), an algorithm that extends the original Proximal Policy Optimization (PPO) to the
multi-agent setting
. The key idea lies in the coordinated adaptation o
→