强调时序差分学习

Jul, 2015

Emphatic Temporal-Difference Learning

A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton

TL;DR该研究概括了近期两个关于强化学习中强调算法的稳定性和收敛性的研究，同时展示了强调算法的灵活性在状态折扣、状态引导和资源分布等方面的经验优势。

Abstract

emphatic algorithms are temporal-difference learning algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by