There have recently been significant advances in the problem of unsupervised object-centric representation learning and its application to downstream tasks. The latest works support the argument that employing disentangled object representations in image-based object-centric reinforcement learning tasks facilitates policy learning. We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches to utilize these representations effectively. In our approach, we use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. The proposed method fills a research gap in developing efficient object-centric world models for reinforcement learning settings that can be used for environments with discrete or continuous action spaces. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm built upon transformer architecture and the state-of-the-art monolithic model-based algorithm.

最近在无监督的物体中心表示学习问题和其在下游任务中的应用方面取得了重大进展。最新研究支持这样一个论点：在基于图像的物体中心强化学习任务中使用解耦的物体表示有助于策略学习。我们提出了一种新颖的物体中心强化学习算法，结合了演员-评论家和基于模型的方法，有效地利用这些表示。我们的方法使用转换编码器提取物体表示，并使用图神经网络来近似环境动力学。所提出的方法填补了开发用于离散或连续动作空间环境的高效物体中心世界模型的研究空白。与基于转换器架构的最先进的无模型演员-评论家算法和最先进的整合模型为基础的算法相比，我们的算法在视觉复杂的三维机器人环境和具有组合结构的二维环境中表现更好。

图形对象中心的演员-评论家算法