BriefGPT.xyz
Oct, 2022
交替离线模型训练和策略学习的统一框架
A Unified Framework for Alternating Offline Model Training and Policy Learning
HTML
PDF
Shentao Yang, Shujian Zhang, Yihao Feng, Mingyuan Zhou
TL;DR
本文提出了一种迭代离线模型学习(MBRL)框架,其中通过交替进行动态模型训练和策略学习来最大化真实预期回报的下限,从而解决了动态模型和策略学习之间的目标不匹配问题,从而在广泛的连续控制离线强化学习数据集上实现了竞争性能。
Abstract
In
offline model-based reinforcement learning
(offline MBRL), we learn a dynamic model from historically collected data, and subsequently utilize the learned model and fixed datasets for
policy learning
, without
→