May, 2023
DoMo-AC: 双重多步骤离线 Actor-Critic 算法
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm
Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos...
TL;DR介绍了一种新方法 doubly multi-step off-policy VI (DoMo-VI) 和其实例 doubly multi-step off-policy actor-critic (DoMo-AC),通过结合 policy improvement 和 policy evaluation 技术使模型训练更快、更准确,并在 Atari-57 游戏基准测试中得到比基线算法更好的结果。