在线稀疏强化学习

Nov, 2020

Online Sparse Reinforcement Learning

Botao Hao, Tor Lattimore, Csaba Szepesvári, Mengdi Wang

TL;DR在稀疏线性马尔可夫决策过程中，通过引入一种新的算法- Lasso fitted Q-iteration，通过一个具有一定条件的数据策略，以几乎无维度代价实现对在线强化学习的降低，但线性后悔在常用政策情况下仍然无法避免。

Abstract

We investigate the hardness of online reinforcement learning in sparse linear markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the numb