Deep reinforcement learning (DRL) agents are often sensitive to visual
changes that were unseen in their training environments. To address this
problem, we leverage the sequential nature of RL to learn robust
representations that encode only task-relevant information from observations
based on the unsupervised multi-view setting. Specifically, we introduce a
novel contrastive version of the Multi-View Information Bottleneck (MIB)
objective for temporal data. We train RL agents from pixels with this auxiliary
objective to learn robust representations that can compress away
task-irrelevant information and are predictive of task-relevant dynamics. This
approach enables us to train high-performance policies that are robust to
visual distractions and can generalize well to unseen environments. We
demonstrate that our approach can achieve SOTA performance on a diverse set of
visual control tasks in the DeepMind Control Suite when the background is
replaced with natural videos. In addition, we show that our approach
outperforms well-established baselines for generalization to unseen
environments on the Procgen benchmark. Our code is open-sourced and available
at this https URL com/BU-DEPEND-Lab/DRIBO.

本研究利用多视角设置引入对比的多视图信息瓶颈目标训练深度强化学习代理程序，从而可以学习到能够保留任务相关信息但压缩掉任务不相关信息的强大的表示，进而训练出具有鲁棒性和泛化性的高性能政策。