PWM：大型世界模型的策略学习

Jul, 2024

PWM: Policy Learning with Large World Models

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, Animesh Garg

TL;DR通过使用大规模多任务环境模型进行策略学习，我们引入了一种名为 Policy learning with large World Models (PWM) 的新型基于模型的强化学习算法，对具有多种实现方式的多任务进行连续控制策略的学习。

Abstract

reinforcement learning (RL) has achieved impressive results on complex tasks but struggles in multi-task settings with different embodiments. world models offer scalability by learning a simulation of the environ