Two desiderata of reinforcement learning (RL) algorithms are the ability to
learn from relatively little experience and the ability to learn policies that
generalize to a range of problem specifications. In factored state spaces, one
approach towards achieving both goals is to learn state abstractions, which
only keep the necessary variables for learning the tasks at hand. This paper
introduces Causal Bisimulation Modeling (CBM), a method that learns the causal
relationships in the dynamics and reward functions for each task to derive a
minimal, task-specific abstraction. CBM leverages and improves implicit
modeling to train a high-fidelity causal dynamics model that can be reused for
all tasks in the same environment. Empirical validation on manipulation
environments and Deepmind Control Suite reveals that CBM's learned implicit
dynamics models identify the underlying causal relationships and state
abstractions more accurately than explicit ones. Furthermore, the derived state
abstractions allow a task learner to achieve near-oracle levels of sample
efficiency and outperform baselines on all tasks.

用因果对等建模（CBM）方法在有因子的状态空间中学习动力学和奖励函数的因果关系，以得出最小的，任务特定的抽象。CBM 的隐式动力学模型可以在相同环境中重复使用，实验验证表明 CBM 的学习到的隐式动力学模型比显式模型更准确地识别了底层因果关系和状态抽象。此外，得出的状态抽象能够使任务学习者在所有任务上实现接近理想的样本效率，并在所有任务中优于基线模型。