In recent years, domains such as natural language processing and image
recognition have popularized the paradigm of using large datasets to pretrain
representations that can be effectively transferred to downstream tasks. In
this work we evaluate how such a paradigm should be done in imitatio
本文提出了一种自监督表征学习方法,它将对比学习与动态模型相结合,以协同地实现三个目标,即通过最大化信息 NCE 界来诱导线性预测嵌入,通过显式学习非线性转换模型进一步提高学习嵌入的马尔可夫性以及最大化下一嵌入的互信息,其基于当前动作和当前状态的两个独立增强的嵌入预测,实验表明,与基于对比学习或重建的现有方法相比,我们的方法在样本效率和泛化性能上都取得了更好的结果。