仅具可实现性的批次值函数逼近

Aug, 2020

Batch Value-function Approximation with Only Realizability

Tengyang Xie, Nan Jiang

TL;DR该研究提出了一种batch reinforcement learning的学习算法BVFT，通过一种基于比较和分区的机制使得学习效率更高并且适用于其他问题和扩展。

Abstract

We solve a long-standing problem in batch reinforcement learning (RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary function class. In fact, all