BriefGPT.xyz
Feb, 2020
价值驱动的后见之明建模
Value-driven Hindsight Modelling
HTML
PDF
Arthur Guez, Fabio Viola, Théophane Weber, Lars Buesing, Steven Kapturowski...
TL;DR
本文提出了利用表征学习中的先验信息直接进行值函数预测的方法,即结合模型学习和模型自由方法的优势,确定哪些未来轨迹特征提供有用信息,从而为任务提供可操作的预测目标,加速值函数的学习。
Abstract
value estimation
is a critical component of the
reinforcement learning
(RL) paradigm. The question of how to effectively learn predictors for value from data is one of the major problems studied by the RL communi
→