BriefGPT.xyz
Dec, 2018
无探索非策略深度强化学习
Off-Policy Deep Reinforcement Learning without Exploration
HTML
PDF
Scott Fujimoto, David Meger, Doina Precup
TL;DR
本文提出了一种新的批量约束强化学习算法,该算法可以从任意固定批量数据中有效学习,为解决强化学习中的一些关键问题提供可能性。
Abstract
reinforcement learning
traditionally considers the task of balancing exploration and exploitation. This work examines batch
reinforcement learning
--the task of maximally exploiting a given batch of
→