Many Reinforcement Learning algorithms assume a Markov reward function to guarantee optimality. However, not all reward functions are known to be Markov. In this paper, we propose a framework for mapping non-Markov reward functions into equivalent Markov ones by learning a Reward Machine - a specialized reward automaton. Unlike the general practice of learning Reward Machines, we do not require a set of high-level propositional symbols from which to learn. Rather, we learn \emph{hidden triggers} directly from data that encode them. We demonstrate the importance of learning Reward Machines versus their Deterministic Finite-State Automata counterparts, for this task, given their ability to model reward dependencies in a single automaton. We formalize this distinction in our learning objective. Our mapping process is constructed as an Integer Linear Programming problem. We prove that our mappings provide consistent expectations for the underlying process. We empirically validate our approach by learning black-box non-Markov Reward functions in the Officeworld Domain. Additionally, we demonstrate the effectiveness of learning dependencies between rewards in a new domain, Breakfastworld.

通过学习奖励机制，将非马尔可夫奖励函数映射为等效的马尔可夫函数，证明了奖励机制相对于确定性有限状态自动机对于建模单一自动机中的奖励依赖性的重要性，并通过在Officeworld领域学习黑盒非马尔可夫奖励函数以及在Breakfastworld领域学习奖励之间的依赖关系的有效性来验证了我们的方法。

检测隐藏的触发器：将非马尔可夫奖励函数映射到马尔可夫