BriefGPT.xyz
Oct, 2022
一种面向离线强化学习的策略引导仿真方法
A Policy-Guided Imitation Approach for Offline Reinforcement Learning
HTML
PDF
Haoran Xu, Li Jiang, Jianxiong Li, Xianyuan Zhan
TL;DR
该研究提出了一种Policy-guided Offline RL算法,该算法在训练时将想法分解为指导策略和执行策略,并通过指导策略来指导执行策略以实现状态组合性。该算法在离线RL的标准基准D4RL上展示了最高效的性能,并可以通过改变指导策略来轻松适应新的任务。
Abstract
offline reinforcement learning
(RL) methods can generally be categorized into two types: RL-based and Imitation-based.
rl-based methods
could in principle enjoy out-of-distribution generalization but suffer from
→