Reinforcement learning (RL) algorithms face significant challenges when
dealing with long-horizon robot manipulation tasks in real-world environments
due to sample inefficiency and safety issues. To overcome these challenges, we
propose a novel framework, SEED, which leverages two approaches: reinforcement
learning from human feedback (RLHF) and primitive skill-based reinforcement
learning. Both approaches are particularly effective in addressing sparse
reward issues and the complexities involved in long-horizon tasks. By combining
them, SEED reduces the human effort required in RLHF and increases safety in
training robot manipulation with RL in real-world settings. Additionally,
parameterized skills provide a clear view of the agent's high-level intentions,
allowing humans to evaluate skill choices before they are executed. This
feature makes the training process even safer and more efficient. To evaluate
the performance of SEED, we conducted extensive experiments on five
manipulation tasks with varying levels of complexity. Our results show that
\algoName significantly outperforms state-of-the-art RL algorithms in sample
efficiency and safety. In addition, SEED also exhibits a substantial reduction
of human effort compared to other RLHF methods. Further details and video
results can be found at this https URL

SEED 是一个结合了人类反馈的强化学习和基于原始技能的强化学习的新框架，通过减少人类的工作量和增加训练过程的安全性，有效地解决了长期任务中的样本低效性和安全性问题。SEED 在五个具有不同复杂度的操作任务上表现出了比其他强化学习算法更高的样本效率和安全性，并且与其他 RLHF 方法相比，也大大减少了人类的工作量。