In this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.

本文提出了一种叫做 PI-Net 的循环神经网络，使用路径积分最优控制算法来实现系统动态学习及成本模型，并通过反向传播和随机梯度下降来端到端地学习动态和成本模型，从而实现规划。通过模仿学习的方式训练 PI-Net，它可以在两个模拟问题上模拟控制演示，并且可以学习演示中的动态和成本模型。

路径积分网络:端到端可微分最优控制