带函数逼近的收敛树备份和Retrace方法

May, 2017

带函数逼近的收敛树备份和Retrace方法

Convergent Tree-Backup and Retrace with Function Approximation

Ahmed Touati, Pierre-Luc Bacon, Doina Precup, Pascal Vincent

TL;DR该论文通过分析 extsc{Tree Backup} 和 extsc{Retrace} 算法在线性函数逼近下的不稳定性，提出了一种基于二次凸凹鞍点公式的稳定高效梯度下降算法，并证明了其收敛性和有限样本上界，同时还提供了对其他算法收敛速度的新证明。

Abstract

off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine