We present Skill Transformer, an approach for solving long-horizon robotic
tasks by combining conditional sequence modeling and skill modularity.
Conditioned on egocentric and proprioceptive observations of a robot, Skill
Transformer is trained end-to-end to predict both a high-level skill (e.g.,
navigation, picking, placing), and a whole-body low-level action (e.g., base
and arm motion), using a transformer architecture and demonstration
trajectories that solve the full task. It retains the composability and
modularity of the overall task through a skill predictor module while reasoning
about low-level actions and avoiding hand-off errors, common in modular
approaches. We test Skill Transformer on an embodied rearrangement benchmark
and find it performs robust task planning and low-level control in new
scenarios, achieving a 2.5x higher success rate than baselines in hard
rearrangement problems.

通过结合条件序列建模和技能模块化，我们提出了 Skill Transformer 方法，用于解决长期规划的机器人任务，并通过 Transformer 架构和演示轨迹对高级技能和低级动作进行端到端训练，并通过技能预测模块保持整体任务的组合性和模块化，同时考虑低级动作并避免常见的模块化方法中的交接错误。在具有挑战性的重新排列问题中，我们对 Skill Transformer 进行了测试，发现其在新场景中执行稳健的任务规划和低级控制，并在成功率上比基线提高了 2.5 倍。

技能变换器：移动操纵的一体化策略

Skill Transformer: A Monolithic Policy for Mobile Manipulation

We present relay policy learning, a method for imitation and reinforcement
learning that can solve multi-stage, long-horizon robotic tasks. This general
and universally-applicable, two-phase approach consists of an imitation
learning stage that produces goal-conditioned hierarchical policies, and a
reinforcement learning phase that finetunes these policies for task
performance. Our method, while not necessarily perfect at imitation learning,
is very amenable to further improvement via environment interaction, allowing
it to scale to challenging long-horizon tasks. We simplify the long-horizon
policy learning problem by using a novel data-relabeling algorithm for learning
goal-conditioned hierarchical policies, where the low-level only acts for a
fixed number of steps, regardless of the goal achieved. While we rely on
demonstration data to bootstrap policy learning, we do not assume access to
demonstrations of every specific tasks that is being solved, and instead
leverage unstructured and unsegmented demonstrations of semantically meaningful
behaviors that are not only less burdensome to provide, but also can greatly
facilitate further improvement using reinforcement learning. We demonstrate the
effectiveness of our method on a number of multi-stage, long-horizon
manipulation tasks in a challenging kitchen simulation environment. Videos are
available at this https URL

本文提出了一种中继策略学习的方法，可用于模仿和强化学习，旨在解决多阶段、长视程机器人任务，包括模仿学习阶段和强化学习阶段，通过学习目标条件分层策略和使用新颖的数据重新标记算法简化了政策学习问题，并证明了该方法在挑战性的厨房模拟环境中解决多阶段、长视程操作任务的有效性。