Unlike the standard Reinforcement Learning (RL) model, many real-world tasks are non-Markovian, whose rewards are predicated on state history rather than solely on the current state. Solving a non-Markovian task, frequently applied in practical applications such as autonomous driving, financial trading, and medical diagnosis, can be quite challenging. We propose a novel RL approach to achieve non-Markovian rewards expressed in temporal logic LTL$_f$ (Linear Temporal Logic over Finite Traces). To this end, an encoding of linear complexity from LTL$_f$ into MDPs (Markov Decision Processes) is introduced to take advantage of advanced RL algorithms. Then, a prioritized experience replay technique based on the automata structure (semantics equivalent to LTL$_f$ specification) is utilized to improve the training process. We empirically evaluate several benchmark problems augmented with non-Markovian tasks to demonstrate the feasibility and effectiveness of our approach.

我们提出了一种新颖的强化学习方法，用于实现基于LTL$_f$（有限轨迹线性时态逻辑）的非马尔可夫奖励，通过LTL$_f$到MDPs（马尔可夫决策过程）的线性复杂度编码，利用自动机结构（与LTL$_f$规范语义等价）的优先经验回放技术来改善训练过程，在多个引入非马尔可夫任务的基准问题上进行经验证明了我们方法的可行性和有效性。

使用经验分类训练非马尔可夫任务