Inverse reinforcement learning (IRL) is the problem of inferring a reward
function from expert behavior. There are several approaches to IRL, but most
are designed to learn a Markovian reward. However, a reward function might be
non-Markovian, depending on more than just the current state, such as a reward
machine (RM). Although there has been recent work on inferring RMs, it assumes
access to the reward signal, absent in IRL. We propose a Bayesian IRL (BIRL)
framework for inferring RMs directly from expert behavior, requiring
significant changes to the standard framework. We define a new reward space,
adapt the expert demonstration to include history, show how to compute the
reward posterior, and propose a novel modification to simulated annealing to
maximize this posterior. We demonstrate that our method performs well when
optimizing according to its inferred reward and compares favorably to an
existing method that learns exclusively binary non-Markovian rewards.

从专家行为中直接推断出奖励机制（RM）的贝叶斯逆强化学习（BIRL）框架，以非 Markovian 奖励函数为基础进行了重大改进，新的奖励空间定义，将专家示范调整为包括历史，展示了计算奖励后验的方法，并提出了一种模拟退火的新修改方案来最大化后验，通过优化其推断的奖励表现出良好性能，并与学习严格二值非 Markovian 奖励的现有方法进行了有利比较。