Existing model-based value expansion methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, the fixed rollout with an inaccurate model has a potential to harm the learning process. In this paper, we investigate the idea of using the model knowledge for value expansion adaptively. We propose a novel method called Dynamic-horizon Model-based Value Expansion (DMVE) to adjust the world model usage with different rollout horizons. Inspired by reconstruction-based techniques that can be applied for visual data novelty detection, we utilize a world model with a reconstruction module for image feature extraction, in order to acquire more precise value estimation. The raw and the reconstructed images are both used to determine the appropriate horizon for adaptive value expansion. On several benchmark visual control tasks, experimental results show that DMVE outperforms all baselines in sample efficiency and final performance, indicating that DMVE can achieve more effective and accurate value estimation than state-of-the-art model-based methods.

本文提出了一种名为DMVE的新方法，通过利用重建模块的世界模型来进行图像特征提取和价值估计，以确定模型知识的有效性，实现了自适应价值扩展的目的，并在基准视觉控制任务中表现出更高效和更准确的价值估计，优于同类方法。

基于模型的强化学习动态视野价值估计