BriefGPT.xyz
Feb, 2021
连续双重约束批次强化学习
Continuous Doubly Constrained Batch Reinforcement Learning
HTML
PDF
Rasool Fakoor, Jonas Mueller, Pratik Chaudhari, Alexander J. Smola
TL;DR
本研究提出基于批次强化学习的算法,仅使用固定的离线数据集而非在线与环境的交互来学习有效策略,并通过策略约束和价值约束对数据集不足的情况进行干扰,实现对候选策略的控制,相比于现有的最新方法在多项连续动作批处理强化学习基准测试中表现优异。
Abstract
Reliant on too many experiments to learn good actions, current
reinforcement learning
(RL) algorithms have limited applicability in real-world settings, which can be too expensive to allow exploration. We propose an algorithm for
→