BriefGPT.xyz
Jan, 2024
检测隐藏的触发器:将非马尔可夫奖励函数映射到马尔可夫
Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov
HTML
PDF
Gregory Hyde, Eugene Santos Jr
TL;DR
通过学习奖励机制,将非马尔可夫奖励函数映射为等效的马尔可夫函数,证明了奖励机制相对于确定性有限状态自动机对于建模单一自动机中的奖励依赖性的重要性,并通过在Officeworld领域学习黑盒非马尔可夫奖励函数以及在Breakfastworld领域学习奖励之间的依赖关系的有效性来验证了我们的方法。
Abstract
Many
reinforcement learning
algorithms assume a
markov reward function
to guarantee optimality. However, not all reward functions are known to be Markov. In this paper, we propose a framework for mapping non-Mark
→