BriefGPT.xyz
May, 2017
带函数逼近的收敛树备份和Retrace方法
Convergent Tree-Backup and Retrace with Function Approximation
HTML
PDF
Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent
TL;DR
该论文通过分析 extsc{Tree Backup} 和 extsc{Retrace} 算法在线性函数逼近下的不稳定性,提出了一种基于二次凸凹鞍点公式的稳定高效梯度下降算法,并证明了其收敛性和有限样本上界,同时还提供了对其他算法收敛速度的新证明。
Abstract
off-policy learning
is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine
→