Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.

本文旨在探讨如何使用逆优化控制（IOC）从演示学习行为，具体应用于对高维机器人系统的扭矩控制。作者提出了一种算法，能够学习任意的非线性成本函数(如神经网络)；同时提出了一种针对 MaxEnt IOC 的高效的基于采样的近似方法。通过一系列模拟任务和真实的机器人操作问题的评估，该方法能够实现显著的任务复杂度和样本效率的提升。

指导成本学习：基于策略优化的深层逆最优控制