BriefGPT.xyz
Nov, 2020
在线稀疏强化学习
Online Sparse Reinforcement Learning
HTML
PDF
Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang
TL;DR
在稀疏线性马尔可夫决策过程中,通过引入一种新的算法- Lasso fitted Q-iteration, 通过一个具有一定条件的数据策略,以几乎无维度代价实现对在线强化学习的降低,但线性后悔在常用政策情况下仍然无法避免。
Abstract
We investigate the hardness of
online reinforcement learning
in
sparse linear markov decision process
(MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the numb
→