Visual model-based RL methods typically encode image observations into
low-dimensional representations in a manner that does not eliminate redundant
information. This leaves them susceptible to spurious variations -- changes in
task-irrelevant components such as background distractors or lighting
conditions. In this paper, we propose a visual model-based RL method that
learns a latent representation resilient to such spurious variations. Our
training objective encourages the representation to be maximally predictive of
dynamics and reward, while constraining the information flow from the
observation to the latent representation. We demonstrate that this objective
significantly bolsters the resilience of visual model-based RL methods to
visual distractors, allowing them to operate in dynamic environments. We then
show that while the learned encoder is resilient to spirious variations, it is
not invariant under significant distribution shift. To address this, we propose
a simple reward-free alignment procedure that enables test time adaptation of
the encoder. This allows for quick adaptation to widely differing environments
without having to relearn the dynamics and policy. Our effort is a step towards
making model-based RL a practical and useful tool for dynamic, diverse domains.
We show its effectiveness in simulation benchmarks with significant spurious
variations as well as a real-world egocentric navigation task with noisy TVs in
the background. Videos and code at this https URL

这篇论文提出了一种视觉模型驱动的强化学习方法，它学习到了一个对噪声和干扰具有弹性的潜在表示，通过鼓励表示能够最大程度地预测动态和奖励，并在观察和潜在表示之间限制信息流。此方法对于视觉干扰具有显著的抵抗力，在动态环境中能够有效运行。此外，作者还提出了一种简单的无奖励对齐过程，使得编码器能够在测试时进行快速适应，无需重新学习动态和策略。这项工作是使模型驱动的强化学习在动态多样的领域中成为实用和有用工具的一步，作者在模拟基准测试以及具有噪声电视背景的真实环境中展示了其有效性。