BriefGPT.xyz
Jul, 2020
长时间多目标强化学习的最大熵增益探索
Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning
HTML
PDF
Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba
TL;DR
本文探讨了当测试目标分布过于远离时,多目标强化学习应当追求怎样的目标,提出了优化历史完成目标分布熵的内在目标,通过在目标空间中极少被探索区域内的历史完成目标的追求实现探索,成功提高了长期目标任务中的数据利用效率。
Abstract
What goals should a multi-goal
reinforcement learning
agent pursue during training in
long-horizon tasks
? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue
→