BriefGPT.xyz
Jun, 2020
批评正则化回归
Critic Regularized Regression
HTML
PDF
Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed...
TL;DR
本文提出了一种基于评价器正则化回归算法(CRR)的新型离线强化学习算法,它能够在高维状态和动作空间下解决固定数据集的离线学习问题,在广泛的基准任务上表现出优越性能。
Abstract
offline reinforcement learning
(RL), also known as
batch rl
, offers the prospect of
policy optimization
from large pre-recorded datasets w
→