BriefGPT.xyz
Aug, 2020
仅具可实现性的批次值函数逼近
Batch Value-function Approximation with Only Realizability
HTML
PDF
Tengyang Xie, Nan Jiang
TL;DR
该研究提出了一种batch reinforcement learning的学习算法BVFT,通过一种基于比较和分区的机制使得学习效率更高并且适用于其他问题和扩展。
Abstract
We solve a long-standing problem in
batch reinforcement learning
(RL): learning $Q^\star$ from an exploratory and polynomial-sized dataset, using a realizable and otherwise arbitrary
function class
. In fact, all
→