路程即是奖励：无监督学习有影响轨迹

May, 2019

The Journey is the Reward: Unsupervised Learning of Influential Trajectories

Jonathan Binas, Sherjil Ozair, Yoshua Bengio

TL;DR该研究提出了一个全新的方法以处理具有大行动空间的复杂环境中的无监督探究和表征的问题，并在考虑到整个轨迹的情况下，通过最大化其对环境未来状态的影响来形式化无监督的探索目标。

Abstract

unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments. The informat