With the great success of diffusion models (DMs) in generating realistic
synthetic vision data, many researchers have investigated their potential in
decision-making and control. Most of these works utilized DMs to sample
directly from the trajectory space, where DMs can be viewed as a combination of
dynamics models and policies. In this work, we explore how to decouple DMs'
ability as dynamics models in fully offline settings, allowing the learning
policy to roll out trajectories. As DMs learn the data distribution from the
dataset, their intrinsic policy is actually the behavior policy induced from
the dataset, which results in a mismatch between the behavior policy and the
learning policy. We propose Dynamics Diffusion, short as DyDiff, which can
inject information from the learning policy to DMs iteratively. DyDiff ensures
long-horizon rollout accuracy while maintaining policy consistency and can be
easily deployed on model-free algorithms. We provide theoretical analysis to
show the advantage of DMs on long-horizon rollout over models and demonstrate
the effectiveness of DyDiff in the context of offline reinforcement learning,
where the rollout dataset is provided but no online environment for
interaction. Our code is at this https URL

探索如何将扩散模型（DMs）的能力作为动力学模型在完全离线环境中解耦，以允许学习策略展开轨迹，并展示了 DyDiff 在离线强化学习中的有效性。