There is a widespread intuition that model-based control methods should be
able to surpass the data efficiency of model-free approaches. In this paper we
attempt to evaluate this intuition on various challenging locomotion tasks. We
take a hybrid approach, combining model predictive control (MPC) with a learned
model and model-free policy learning; the learned policy serves as a proposal
for MPC. We find that well-tuned model-free agents are strong baselines even
for high DoF control problems but MPC with learned proposals and models
(trained on the fly or transferred from related tasks) can significantly
improve performance and data efficiency in hard multi-task/multi-goal settings.
Finally, we show that it is possible to distil a model-based planner into a
policy that amortizes the planning computation without any loss of performance.
Videos of agents performing different tasks can be seen at
this https URL

本文探讨了基于模型的控制方法是否能够超越基于数据的方法。研究人员通过将模型预测控制与学习模型和基于数据的策略学习相结合的方式，对多种具有挑战性的运动任务进行了评估，并发现：经过良好调节的基于数据的策略学习代理是高自由度控制问题的强基准。但是，针对困难的多任务 / 多目标场景，经过训练的动态模型和学习的策略作为模型预测控制的建议，可以显著提高性能和数据效率。最后，研究表明，即使没有性能损失，也可以将基于模型的规划器简化为一种策略，从而将计划计算的负担分担到了策略中。

针对连续控制评估基于模型的规划和规划器分摊

Evaluating model-based planning and planner amortization for continuous  control

We study a novel architecture and training procedure for locomotion tasks. A
high-frequency, low-level "spinal" network with access to proprioceptive
sensors learns sensorimotor primitives by training on simple tasks. This
pre-trained module is fixed and connected to a low-frequency, high-level
"cortical" network, with access to all sensors, which drives behavior by
modulating the inputs to the spinal network. Where a monolithic end-to-end
architecture fails completely, learning with a pre-trained spinal module
succeeds at multiple high-level tasks, and enables the effective exploration
required to learn from sparse rewards. We test our proposed architecture on
three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional
quadruped, and a 54-dimensional humanoid. Our results are illustrated in the
accompanying video at this https URL

研究一种新的架构和训练程序，通过训练简单的任务，以高频率、低层次的 “脊髓” 网络与本体感觉运动神经元进行学习。这个预训练模块通过修正脊髓网络的输入来驱动行为，从而使学习从稀疏的奖励中得到有效的探索。在三种虚拟体内（16 维游泳蛇、20 维四足动物和 54 维人形），通过我们提出的架构进行测试并产生了明显的进展，详见附带的视频