Diffusion models are a powerful class of generative models capable of mapping random noise in high-dimensional spaces to a target manifold through iterative denoising. In this work, we present a novel perspective on goal-conditioned reinforcement learning by framing it within the context of diffusion modeling. Analogous to the diffusion process, where Gaussian noise is used to create random trajectories that walk away from the data manifold, we construct trajectories that move away from potential goal states. We then learn a goal-conditioned policy analogous to the score function. This approach, which we call Merlin, can reach predefined or novel goals from an arbitrary initial state without learning a separate value function. We consider three choices for the noise model to replace Gaussian noise in diffusion - reverse play from the buffer, reverse dynamics model, and a novel non-parametric approach. We theoretically justify our approach and validate it on offline goal-reaching tasks. Empirical results are competitive with state-of-the-art methods, which suggests this perspective on diffusion for RL is a simple, scalable, and effective direction for sequential decision-making.

Diffusion models可以将高维空间中的随机噪声通过迭代去噪映射到目标流形，来解决强化学习中以目标条件为导向的问题。本文提出了一种名为Merlin的方法，利用类似扩散过程的思想，在高维空间中构建从潜在目标状态扩散而远离的轨迹，并通过学习类似分值函数的目标条件策略，能够从任意初始状态到达预定义或新颖的目标。本文在离线目标达成任务上进行理论验证和实证实验，结果表明这种针对序列决策问题的扩散思路是一种简单、可扩展且有效的方向。

通过扩散学习实现目标达成