离线强化学习的极简主义方法

Jun, 2021

A Minimalist Approach to Offline Reinforcement Learning

Scott Fujimoto, Shixiang Shane Gu

TL;DR通过在在线强化学习算法的策略更新中添加行为克隆项并规范化数据，在保持简单性的同时，最大限度地提高了运行效率，从而实现了与现有离线RL算法相当的性能。

Abstract

offline reinforcement learning (RL) defines the task of learning from a fixed batch of data. Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms take the approach of con