BriefGPT.xyz
Jul, 2015
强调时序差分学习
Emphatic Temporal-Difference Learning
HTML
PDF
A. Rupam Mahmood, Huizhen Yu, Martha White, Richard S. Sutton
TL;DR
该研究概括了近期两个关于强化学习中强调算法的稳定性和收敛性的研究,同时展示了强调算法的灵活性在状态折扣、状态引导和资源分布等方面的经验优势。
Abstract
emphatic algorithms
are
temporal-difference learning
algorithms that change their effective state distribution by selectively emphasizing and de-emphasizing their updates on different time steps. Recent works by
→