无探索非策略深度强化学习

Dec, 2018

Off-Policy Deep Reinforcement Learning without Exploration

Scott Fujimoto, David Meger, Doina Precup

TL;DR本文提出了一种新的批量约束强化学习算法，该算法可以从任意固定批量数据中有效学习，为解决强化学习中的一些关键问题提供可能性。

Abstract

reinforcement learning traditionally considers the task of balancing exploration and exploitation. This work examines batch reinforcement learning--the task of maximally exploiting a given batch of →