带函数逼近的可证收敛双时标离策略Actor-Critic算法

Nov, 2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation

Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

TL;DR本文提出了第一个可证明收敛的双时间尺度离线策略演员-评论家算法（COF-PAC）并引入了一个新的评论家，强调评论家，通过梯度强调学习来训练。通过强调评论家和典型的价值函数评论家的帮助，证明了COF-PAC的收敛性，其中批评家是线性的，演员可以是非线性的。

Abstract

We present the first provably convergent off-policy actor-critic algorithm (cof-pac) with function approximation in a two-timescale form.