Reinforcement learning often suffer from the sparse reward issue in
real-world robotics problems. Learning from demonstration (LfD) is an effective
way to eliminate this problem, which leverages collected expert data to aid
online learning. Prior works often assume that the learning agent and the
expert aim to accomplish the same task, which requires collecting new data for
every new task. In this paper, we consider the case where the target task is
mismatched from but similar with that of the expert. Such setting can be
challenging and we found existing LfD methods can not effectively guide
learning in mismatched new tasks with sparse rewards. We propose conservative
reward shaping from demonstration (CRSfD), which shapes the sparse rewards
using estimated expert value function. To accelerate learning processes, CRSfD
guides the agent to conservatively explore around demonstrations. Experimental
results of robot manipulation tasks show that our approach outperforms baseline
LfD methods when transferring demonstrations collected in a single task to
other different but similar tasks.

该论文提出了一种名为保守奖励塑造的学习方法，用于解决强化学习中的稀疏奖励问题，并在机器人操纵任务中实现了学习从演示中获取的技能以应用于其他相似但不同任务的能力。