BriefGPT.xyz
Feb, 2024
自适应约束下的自训练近最优强化学习
Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints
HTML
PDF
Dan Qiao, Yu-Xiang Wang
TL;DR
多智能体强化学习中,通过引入自适应约束,我们设计一种基于消除的算法,在低批次复杂度下实现了对马尔可夫博弈的极小后悔,并且证明了匹配上界的批次复杂度下限,进一步地在理解低适应性的多智能体强化学习方面提供了首个一系列结果。
Abstract
We study the problem of
multi-agent reinforcement learning
(MARL) with
adaptivity constraints
-- a new problem motivated by real-world applications where deployments of new policies are costly and the number of p
→