BriefGPT.xyz
Jun, 2024
提高基于模型的离线强化学习的确定性不确定性传播
Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning
HTML
PDF
Abdullah Akgül, Manuel Haußmann, Melih Kandemir
TL;DR
利用动量匹配离线模型优化的方法(MOMBO),通过确定性传播不确定性,解决了模型基于离线强化学习中由于过度惩罚导致次优策略问题的挑战,并通过在各种环境中的实证研究证明MOMBO是更稳定和更高效的方法。
Abstract
Current approaches to
model-based offline reinforcement learning
(RL) often incorporate
uncertainty-based reward penalization
to address the distributional shift problem. While these approaches have achieved some
→