BriefGPT.xyz
Jun, 2021
离线强化学习作为反探索策略
Offline Reinforcement Learning as Anti-Exploration
HTML
PDF
Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, Léonard Hussenot, Olivier Bachem...
TL;DR
本研究提出了一种新的离线强化学习代理,将基于奖励的勘探法的探索奖励从奖励中减去,以使策略保持在数据集的支持范围内,并连接该方法到对学习策略向数据集的普遍约束的正则化,通过基于变分自动编码器的预测误差的奖励进行实例化,证明了该代理在一组连续控制运动和操作任务的状态下存在竞争力。
Abstract
offline reinforcement learning
(RL) aims at learning an
optimal control
from a fixed dataset, without interactions with the system. An agent in this setting should avoid selecting actions whose consequences canno
→