Linear temporal logic (LTL) and, more generally, $\omega$-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence explainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for $\omega$-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and $\omega$-regular objectives can be learned asymptotically.

本研究解决了强化学习中线性时序逻辑（LTL）和ω-正则目标与传统折扣和平均奖励目标之间的关系这一问题。提出了一种新的方法，通过有限记忆奖励机器以最优性保持的方式将ω-正则目标转换为极限平均奖励问题。研究显示，LTL和ω-正则目标的最优策略可以逐步学习，从而填补了该领域的一个空白。

通过最优性保持转换到平均奖励的强化学习与线性时序逻辑和ω-正则目标