In many reinforcement learning tasks, the agent has to learn to interact with many objects of different types and generalize to unseen combinations and numbers of objects. Often a task is a composition of previously learned tasks (e.g. block stacking). These are examples of compositional generalization, in which we compose object-centric representations to solve complex tasks. Recent works have shown the benefits of object-factored representations and hierarchical abstractions for improving sample efficiency in these settings. On the other hand, these methods do not fully exploit the benefits of factorization in terms of object attributes. In this paper, we address this opportunity and introduce the Dynamic Attribute FacTored RL (DAFT-RL) framework. In DAFT-RL, we leverage object-centric representation learning to extract objects from visual inputs. We learn to classify them in classes and infer their latent parameters. For each class of object, we learn a class template graph that describes how the dynamics and reward of an object of this class factorize according to its attributes. We also learn an interaction pattern graph that describes how objects of different classes interact with each other at the attribute level. Through these graphs and a dynamic interaction graph that models the interactions between objects, we can learn a policy that can then be directly applied in a new environment by just estimating the interactions and latent parameters. We evaluate DAFT-RL in three benchmark datasets and show our framework outperforms the state-of-the-art in generalizing across unseen objects with varying attributes and latent parameters, as well as in the composition of previously learned tasks.

在这篇论文中，我们介绍了动态属性因子强化学习（DAFT-RL）框架，通过利用物体中心表示学习从视觉输入中提取物体，并学习对它们进行分类和推断其潜在参数。我们通过学习类别的模板图和对象之间属性级别的相互作用模式图，以及描述对象之间相互作用的动态交互图，可以学习一个策略，从而可以在新的环境中直接应用通过估计交互和潜在参数。我们在三个基准数据集中评估了DAFT-RL，并展示了我们的框架在跨不同属性和潜在参数的未知对象之间进行泛化以及复合之前学习的任务方面优于现有技术。

学习动态属性分解世界模型以提高多目标强化学习效率