Learning models of the environment from pure interaction is often considered
an essential component of building lifelong reinforcement learning agents.
However, the common practice in model-based reinforcement learning<
本文提出了一种新的基于模型的强化学习算法 MPPVE(Model-based Planning Policy Learning with Multi-step Plan Value Estimation),通过引入多步计划来替换多步行动,采用多步计划价值估计来更新政策,从而更好地利用学习到的模型,实现比现有基于模型的强化学习方法更好的样本效率。