In standard Reinforcement Learning settings, agents typically assume
immediate feedback about the effects of their actions after taking them.
However, in practice, this assumption may not hold true due to physical
constraints and can significantly impact the performance of RL algorithms. In
this paper, we focus on addressing observation delays in partially observable
environments. We propose leveraging world models, which have shown success in
integrating past observations and learning dynamics, to handle observation
delays. By reducing delayed POMDPs to delayed MDPs with world models, our
methods can effectively handle partial observability, where existing approaches
achieve sub-optimal performance or even degrade quickly as observability
decreases. Experiments suggest that one of our methods can outperform a naive
model-based approach by up to %30. Moreover, we evaluate our methods on visual
input based delayed environment, for the first time showcasing delay-aware
reinforcement learning on visual observations.

在标准强化学习设置中，通过立即获得行为后效果的反馈是常见的假设；然而，由于物理限制，在实践中这种假设可能并不成立，可能严重影响强化学习算法的性能。本文关注部分可观测环境中观察延迟的处理。我们提出利用过去观测和学习动态的世界模型来处理观察延迟。通过将延迟型 POMDP 降低为具有世界模型的延迟型 MDP，我们的方法可以有效处理部分可观察性，在现有方法在可观察性降低时实现次优性能甚至迅速降级的情况下表现出更好的性能。实验证明，我们的方法之一可以比天真的基于模型的方法的表现高出 30%。此外，我们首次在基于视觉输入的延迟环境上评估了我们的方法，展示了延迟感知的视觉观察强化学习。