Non-markovian Reinforcement Learning (RL) tasks are very hard to solve, because agents must consider the entire history of state-action pairs to act rationally in the environment. Most works use symbolic formalisms (as Linear Temporal Logic or automata) to specify the temporally-extended task. These approaches only work in finite and discrete state environments or continuous problems for which a mapping between the raw state and a symbolic interpretation is known as a symbol grounding (SG) function. Here, we define Neural Reward Machines (NRM), an automata-based neurosymbolic framework that can be used for both reasoning and learning in non-symbolic non-markovian RL domains, which is based on the probabilistic relaxation of Moore Machines. We combine RL with semisupervised symbol grounding (SSSG) and we show that NRMs can exploit high-level symbolic knowledge in non-symbolic environments without any knowledge of the SG function, outperforming Deep RL methods which cannot incorporate prior knowledge. Moreover, we advance the research in SSSG, proposing an algorithm for analysing the groundability of temporal specifications, which is more efficient than baseline techniques of a factor $10^3$.

本研究解决了非马尔可夫强化学习任务中的历史状态-动作对考量问题，提出了一种新颖的神经奖励机器（NRM）框架，能够在非符号非马尔可夫环境中进行推理和学习。NRM有效整合了半监督符号基础（SSSG）与强化学习，证明其能够在未掌握符号基础函数的情况下，运用高级符号知识并超越传统深度强化学习方法的性能。研究还提出了一种新算法，用于分析时间规范的基础性，这一方法效率比基线技术高出1000倍。