The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-quality dense reward from sparse rewards and demonstrations if given. The learned rewards can be \textit{reused} in unseen tasks, thus reducing the human effort for reward engineering. Extensive experiments on three physical robot manipulation task families with 1000+ task variants demonstrate that our learned rewards can be reused in unseen tasks, resulting in improved performance and sample efficiency of RL algorithms. The learned rewards even achieve comparable performance to human-engineered rewards on some tasks. See our project page (https://sites.google.com/view/iclr24drs) for more details.

我们提出了一种学习可重复使用稠密奖励的新方法，称为DrS，能够通过利用任务的阶段结构从稀疏奖励和示范中学习高质量的稠密奖励，并在未知任务中复用，从而减少了人工奖励设计的工作。实验证明我们学到的奖励在未知任务中可以复用，提高了强化学习算法的性能和样本效率，有些任务的性能甚至与人工奖励相媲美。

DrS: 针对多阶段任务学习可重复使用的密集奖励