BriefGPT.xyz
Nov, 2020
针对折扣设置的Wang-Foster-Kakade下限变形
A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting
HTML
PDF
Philip Amortila, Nan Jiang, Tengyang Xie
TL;DR
本文研究了在有限时间和与折扣因素相关的情况下,批量强化学习中价值函数和特征覆盖的硬度,并说明了即使有无限数量的数据,学习也无法进行。
Abstract
Recently, Wang et al. (2020) showed a highly intriguing hardness result for
batch reinforcement learning
(RL) with linearly realizable
value function
and good
→