BriefGPT.xyz
Jun, 2021
离线强化学习的极简主义方法
A Minimalist Approach to Offline Reinforcement Learning
HTML
PDF
Scott Fujimoto, Shixiang Shane Gu
TL;DR
通过在在线强化学习算法的策略更新中添加行为克隆项并规范化数据,在保持简单性的同时,最大限度地提高了运行效率,从而实现了与现有离线RL算法相当的性能。
Abstract
offline reinforcement learning
(RL) defines the task of learning from a fixed batch of data. Due to errors in
value estimation
from out-of-distribution actions, most offline RL algorithms take the approach of con
→