可证明高效的最大熵探索

Dec, 2018

Provably Efficient Maximum Entropy Exploration

Elad Hazan, Sham M. Kakade, Karan Singh, Abby Van Soest

TL;DR该研究采用条件梯度法，利用近似MDP求解器提供高效算法，解决了在没有奖励信号的情况下对一类内在目标进行优化的问题。

Abstract

Suppose an agent is in a (possibly unknown) markov decision process (MDP) in the absence of a reward signal, what might we hope that an agent can efficiently learn to do? One natural, intrinsically defined, objective problem is for the agent to learn a policy which induces a distributi