TL;DR通过引入图神经网络,该论文提出的价值迭代算法执行图神经网络,跨越任意环境模型,并在 VI 的中间步骤上受到直接监督,证明了具有强监督的 GNN 执行者是深度强化学习系统中可行的组成部分。
Abstract
Many reinforcement learning tasks can benefit from explicit planning based on
an internal model of the environment. Previously, such planning components have
been incorporated through a neural network that partia