Learning robust and generalizable world models is crucial for enabling efficient and scalable robotic control in real-world environments. In this work, we introduce a novel framework for learning world models that accurately capture complex, partially observable, and stochastic dynamics. The proposed method employs a dual-autoregressive mechanism and self-supervised training to achieve reliable long-horizon predictions without relying on domain-specific inductive biases, ensuring adaptability across diverse robotic tasks. We further propose a policy optimization framework that leverages world models for efficient training in imagined environments and seamless deployment in real-world systems. Through extensive experiments, our approach consistently outperforms state-of-the-art methods, demonstrating superior autoregressive prediction accuracy, robustness to noise, and generalization across manipulation and locomotion tasks. Notably, policies trained with our method are successfully deployed on ANYmal D hardware in a zero-shot transfer, achieving robust performance with minimal sim-to-real performance loss. This work advances model-based reinforcement learning by addressing the challenges of long-horizon prediction, error accumulation, and sim-to-real transfer. By providing a scalable and robust framework, the introduced methods pave the way for adaptive and efficient robotic systems in real-world applications.

本研究解决了在真实环境中实现高效和可扩展机器人控制时，对稳健和具有通用性的世界模型学习的需求。提出了一种新颖的双自回归机制和自监督训练框架，能够准确捕捉复杂的部分可观测和随机动态，并在各种机器人任务之间实现适应性。通过大量实验，证明了该方法在预测准确性、噪声鲁棒性和跨任务泛化方面优于现有技术，成功实现了机器人系统的零-shot迁移。

机器人世界模型：用于机器人领域稳健策略优化的神经网络模拟器