BriefGPT.xyz
May, 2019
路程即是奖励:无监督学习有影响轨迹
The Journey is the Reward: Unsupervised Learning of Influential Trajectories
HTML
PDF
Jonathan Binas, Sherjil Ozair, Yoshua Bengio
TL;DR
该研究提出了一个全新的方法以处理具有大行动空间的复杂环境中的无监督探究和表征的问题,并在考虑到整个轨迹的情况下,通过最大化其对环境未来状态的影响来形式化无监督的探索目标。
Abstract
unsupervised exploration
and
representation learning
become increasingly important when learning in diverse and sparse environments. The
informat
→