Learning to control an agent from data collected offline in a rich pixel-based visual observation space is vital for real-world applications of reinforcement learning (RL). A major challenge in this setting is the presence of input information that is hard to model and irrelevant to controlling the agent. This problem has been approached by the theoretical RL community through the lens of exogenous information, i.e, any control-irrelevant information contained in observations. For example, a robot navigating in busy streets needs to ignore irrelevant information, such as other people walking in the background, textures of objects, or birds in the sky. In this paper, we focus on the setting with visually detailed exogenous information, and introduce new offline RL benchmarks offering the ability to study this problem. We find that contemporary representation learning techniques can fail on datasets where the noise is a complex and time dependent process, which is prevalent in practical applications. To address these, we propose to use multi-step inverse models, which have seen a great deal of interest in the RL theory community, to learn Agent-Controller Representations for Offline-RL (ACRO). Despite being simple and requiring no reward, we show theoretically and empirically that the representation created by this objective greatly outperforms baselines.

本文介绍了一个针对 offline-RL 问题的新的基准测试以及引入 ACRO 方法来解决视觉详细的外在信息的控制问题。研究发现当前的表征学习技术在实际应用中存在复杂和时变过程的噪声时很容易失败。ACRO 理论和实验证明，使用多步骤反向模型可以学习到代理控制器的表示并显著优于基线。

代理-控制器表示方法: 基于丰富外部信息的系统离线强化学习