BriefGPT.xyz
Feb, 2020
学习具有低基本Bellman误差的近最优策略
Learning Near Optimal Policies with Low Inherent Bellman Error
HTML
PDF
Andrea Zanette, Alessandro Lazaric, Mykel Kochenderfer, Emma Brunskill
TL;DR
研究在近似线性行动价值函数的情况下,基于低内在Bellman误差的探索问题,给出了一种算法,其高概率的遗憾上界与特征维数和Bellman误差有关,同时将其与先前的工作进行了比较,在线性MDP的情况下,证明了这个算法具有统计效率。
Abstract
We study the
exploration
problem with
approximate linear action-value functions
in episodic reinforcement learning under the notion of
low inhere
→