Exploration remains a significant challenge in reinforcement learning, especially in environments where extrinsic rewards are sparse or non-existent. The recent rise of foundation models, such as CLIP, offers an opportunity to leverage pretrained, semantically rich embeddings that encapsulate broad and reusable knowledge. In this work we explore the potential of these foundation models not just to drive exploration, but also to analyze the critical role of the episodic novelty term in enhancing exploration effectiveness of the agent. We also investigate whether providing the intrinsic module with complete state information -- rather than just partial observations -- can improve exploration, despite the difficulties in handling small variations within large state spaces. Our experiments in the MiniGrid domain reveal that intrinsic modules can effectively utilize full state information, significantly increasing sample efficiency while learning an optimal policy. Moreover, we show that the embeddings provided by foundation models are sometimes even better than those constructed by the agent during training, further accelerating the learning process, especially when coupled with the episodic novelty term to enhance exploration.

本研究解决了强化学习中探索不足的问题，特别是在外部奖励稀少或缺失的环境中。通过利用预训练的基础模型，该研究提出了一种新颖的方法，强调发掘情节新颖性项在提高代理探索有效性方面的重要作用。实验结果表明，完整状态信息的内在模块显著提高了样本效率，并加速了学习过程，显示出基础模型的嵌入效果优于代理在训练期间构建的嵌入。

利用预训练基础模型促进强化学习中的内在动机