In standard Reinforcement Learning settings, agents typically assume immediate feedback about the effects of their actions after taking them. However, in practice, this assumption may not hold true due to physical constraints and can significantly impact the performance of RL algorithms. In this paper, we focus on addressing observation delays in partially observable environments. We propose leveraging world models, which have shown success in integrating past observations and learning dynamics, to handle observation delays. By reducing delayed POMDPs to delayed MDPs with world models, our methods can effectively handle partial observability, where existing approaches achieve sub-optimal performance or even degrade quickly as observability decreases. Experiments suggest that one of our methods can outperform a naive model-based approach by up to %30. Moreover, we evaluate our methods on visual input based delayed environment, for the first time showcasing delay-aware reinforcement learning on visual observations.

在标准强化学习设置中，通过立即获得行为后效果的反馈是常见的假设；然而，由于物理限制，在实践中这种假设可能并不成立，可能严重影响强化学习算法的性能。本文关注部分可观测环境中观察延迟的处理。我们提出利用过去观测和学习动态的世界模型来处理观察延迟。通过将延迟型POMDP降低为具有世界模型的延迟型MDP，我们的方法可以有效处理部分可观察性，在现有方法在可观察性降低时实现次优性能甚至迅速降级的情况下表现出更好的性能。实验证明，我们的方法之一可以比天真的基于模型的方法的表现高出30%。此外，我们首次在基于视觉输入的延迟环境上评估了我们的方法，展示了延迟感知的视觉观察强化学习。

通过世界模型进行延迟观察的强化学习