BriefGPT.xyz
Dec, 2022
VO$Q$L: 非线性函数逼近下无模型强化学习的最优遗憾
VO$Q$L: Towards Optimal Regret in Model-free RL with Nonlinear Function Approximation
HTML
PDF
Alekh Agarwal, Yujia Jin, Tong Zhang
TL;DR
该研究旨在通过引入新算法 VOQL,改进理论边界,并实现对线性MDP等函数类的回归任务进行计算上的高效且统计优化的可行性。
Abstract
We study time-inhomogeneous episodic
reinforcement learning
(RL) under general
function approximation
and
sparse rewards
. We design a new
→