TL;DR提出了一种用于探索的通用框架 Masked Input Modeling for Exploration (MIMEx),它能够通过灵活调整掩码分布来控制条件预测任务的难度,并在一系列挑战性的稀疏奖励视觉运动任务中取得了优异的结果。
Abstract
Exploring in environments with high-dimensional observations is hard. One
promising approach for exploration is to use intrinsic rewards, which often
boils down to estimating "novelty" of states, transitions, or