BriefGPT.xyz
Mar, 2021
具有 Fisher 散度批判正则化的离线强化学习
Offline Reinforcement Learning with Fisher Divergence Critic Regularization
HTML
PDF
Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum
TL;DR
该篇研究提出了一种新颖的离线强化学习算法- Fisher-BRC,它使用神经网络学习参数,将既有离线数据的行为策略与网络学习的行为策略结合起来,实现了更快的收敛速度和更好的表现。
Abstract
Many modern approaches to
offline reinforcement learning
(RL) utilize
behavior regularization
, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the o
→