BriefGPT.xyz
Oct, 2023
在线线性$ q^π $可实现MDPs中的RL与线性MDPs一样简单,只要你学会忽略什么
Online RL in Linearly $q^π$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore
HTML
PDF
Gellért Weisz, András György, Csaba Szepesvári
TL;DR
在线强化学习中的线性可实现的马尔可夫决策过程(MDP),提出了一种计算效率较低的学习算法,通过跳过特定状态转化为线性 MDP,并证明了该算法在这种情况下具有多项式样本复杂度。
Abstract
We consider
online reinforcement learning
(RL) in episodic
markov decision processes
(MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be exp
→