We study the problem of cross-embodiment inverse reinforcement learning, where we wish to learn a reward function from video demonstrations in one or more embodiments and then transfer the learned reward to a different embodiment (e.g., different action space, dynamics, size, shape, etc.). Learning reward functions that transfer across embodiments is important in settings such as teaching a robot a policy via human video demonstrations or teaching a robot to imitate a policy from another robot with a different embodiment. However, prior work has only focused on cases where near-optimal demonstrations are available, which is often difficult to ensure. By contrast, we study the setting of cross-embodiment reward learning from mixed-quality demonstrations. We demonstrate that prior work struggles to learn generalizable reward representations when learning from mixed-quality data. We then analyze several techniques that leverage human feedback for representation learning and alignment to enable effective cross-embodiment learning. Our results give insight into how different representation learning techniques lead to qualitatively different reward shaping behaviors and the importance of human feedback when learning from mixed-quality, mixed-embodiment data.

本研究解决了跨形态逆强化学习中从混合质量示范学习奖励函数的问题。我们提出通过人类反馈来改进表示学习和对齐的方法，以便更有效地进行跨形态学习。研究结果表明，不同的表示学习技术会导致奖励塑造行为的显著差异，而人类反馈在处理混合质量和混合形态的数据时至关重要。

来自人类反馈的跨形态奖励学习的表示对齐