BriefGPT.xyz
Nov, 2022
通过受限制优化提升内在奖励
Redeeming Intrinsic Rewards via Constrained Optimization
HTML
PDF
Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal
TL;DR
该研究提出了一种名为EIPO的优化策略,通过自动调整内在奖励的重要性来平衡任务奖励和内在奖励的关系,以获得最佳探索结果。经过在61个ATARI游戏中的测试,表现优异。
Abstract
State-of-the-art
reinforcement learning
(RL) algorithms typically use random sampling (e.g., $\epsilon$-greedy) for
exploration
, but this method fails in hard
→