TL;DR通过使用大规模多任务环境模型进行策略学习,我们引入了一种名为 Policy learning with large World Models (PWM) 的新型基于模型的强化学习算法,对具有多种实现方式的多任务进行连续控制策略的学习。
Abstract
reinforcement learning (RL) has achieved impressive results on complex tasks but struggles in multi-task settings with different embodiments. world models offer scalability by learning a simulation of the environ