BriefGPT.xyz
Mar, 2021
具有恒定子最优性差异的可线性实现MDPs的指数下界
An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap
HTML
PDF
Yuanhao Wang, Ruosong Wang, Sham M. Kakade
TL;DR
本研究讨论在线强化学习问题,探讨了是否能够通过加入一个常数子优性差值的假设来实现有效学习,结果发现即使假设线性实现了最优Q函数,仍然需要指数级别的样本量,进一步证明在线学习和生成模型学习之间存在指数差距。
Abstract
A fundamental question in the theory of
reinforcement learning
is: suppose the optimal $Q$-function lies in the linear span of a given $d$ dimensional feature mapping, is sample-efficient
reinforcement learning
(
→