Many robotic systems, such as mobile manipulators or quadrotors, cannot be
equipped with high-end GPUs due to space, weight, and power constraints. These
constraints prevent these systems from leveraging recent developments in
visuomotor policy architectures that require high-end GPUs to achieve fast
policy inference. In this paper, we propose Consistency Policy, a faster and
similarly powerful alternative to Diffusion Policy for learning visuomotor
robot control. By virtue of its fast inference speed, Consistency Policy can
enable low latency decision making in resource-constrained robotic setups. A
Consistency Policy is distilled from a pretrained Diffusion Policy by enforcing
self-consistency along the Diffusion Policy's learned trajectories. We compare
Consistency Policy with Diffusion Policy and other related speed-up methods
across 6 simulation tasks as well as two real-world tasks where we demonstrate
inference on a laptop GPU. For all these tasks, Consistency Policy speeds up
inference by an order of magnitude compared to the fastest alternative method
and maintains competitive success rates. We also show that the Conistency
Policy training procedure is robust to the pretrained Diffusion Policy's
quality, a useful result that helps practioners avoid extensive testing of the
pretrained model. Key design decisions that enabled this performance are the
choice of consistency objective, reduced initial sample variance, and the
choice of preset chaining steps. Code and training details will be released
publicly.

通过一项快速推断的 Consistency Policy 方法，本研究提出了一种在资源受限的机器人系统中实现低延迟决策的有效替代 Diffusion Policy 的学习视觉动作控制方法。通过在已训练的 Diffusion Policy 中强制实施自我一致性，从而获得 Consistency Policy，并在六个仿真任务和两个真实世界任务上与 Diffusion Policy 和其他相关加速方法进行比较，结果显示 Consistency Policy 相比其他方法可以提高一个数量级的推断速度并保持竞争性的成功率。

一致性策略：通过一致性蒸馏加速视觉动作策略

Consistency Policy: Accelerated Visuomotor Policies via Consistency  Distillation

Score-based generative models like the diffusion model have been testified to
be effective in modeling multi-modal data from image generation to
reinforcement learning (RL). However, the inference process of diffusion model
can be slow, which hinders its usage in RL with iterative sampling. We propose
to apply the consistency model as an efficient yet expressive policy
representation, namely consistency policy, with an actor-critic style algorithm
for three typical RL settings: offline, offline-to-online and online. For
offline RL, we demonstrate the expressiveness of generative models as policies
from multi-modal data. For offline-to-online RL, the consistency policy is
shown to be more computational efficient than diffusion policy, with a
comparable performance. For online RL, the consistency policy demonstrates
significant speedup and even higher average performances than the diffusion
policy.

我们提出了一种用于离线、离线到在线和在线三种典型强化学习设置的高效且表达力强的策略表示方法，称为一致性策略，以一种演员 - 评论家风格的算法应用连续模型，展现了其在多模态数据、计算效率和性能方面的优势。