Nov, 2023
AlberDICE: 通过交替稳定分布校正估计解决离线多智能体强化学习中的分布外联合动作
AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation
Daiki E. Matsunaga, Jongmin Lee, Jaeseok Yoon, Stefanos Leonardos, Pieter Abbeel...
TL;DRAlberDICE 是一种离线多智能体强化学习算法,通过交替进行集中培训和避免选择分布超出参考数据的联合行动,有效地解决了离线多智能体强化学习中的分布偏移问题。