BriefGPT.xyz
Nov, 2020
线性函数逼近强化学习的对数遗憾
Logarithmic Regret for Reinforcement Learning with Linear Function Approximation
HTML
PDF
Jiafan He, Dongruo Zhou, Quanquan Gu
TL;DR
该研究探讨了使用线性函数逼近的强化学习,提出了新的线性MDP假设,并通过实验证明了具有对最优行动价值函数的正增量的情况下可以获得对数后悔界限。
Abstract
reinforcement learning
(RL) with
linear function approximation
has received increasing attention recently. However, existing work has focused on obtaining $\sqrt{T}$-type regret bound, where $T$ is the number of
→