Víctor Campos, Alexander Trott, Caiming Xiong, Richard Socher, Xavier Giro-i-Nieto...
TL;DR本文提出了一种名为'探索、发现、学习'(Explore, Discover and Learn, EDL)的方法,用于在没有面向任务的奖励功能的情况下获取技能,从而解决现有信息理论技能探索算法的覆盖问题,并在受控环境中进行全面的评估
Abstract
Acquiring abilities in the absence of a task-oriented reward function is at the frontier of reinforcement learning research. This problem has been studied through the lens of empowerment, which draws a connection