In Reinforcement Learning (RL), artificial agents are trained to maximize
numerical rewards by performing tasks. Exploration is essential in RL because
agents must discover information before exploiting it. Two rewards encouraging
efficient exploration are the entropy of action policy and curiosity for
information gain. Entropy is well-established in literature, promoting
randomized action selection. Curiosity is defined in a broad variety of ways in
literature, promoting discovery of novel experiences. One example, prediction
error curiosity, rewards agents for discovering observations they cannot
accurately predict. However, such agents may be distracted by unpredictable
observational noises known as curiosity traps. Based on the Free Energy
Principle (FEP), this paper proposes hidden state curiosity, which rewards
agents by the KL divergence between the predictive prior and posterior
probabilities of latent variables. We trained six types of agents to navigate
mazes: baseline agents without rewards for entropy or curiosity, and agents
rewarded for entropy and/or either prediction error curiosity or hidden state
curiosity. We find entropy and curiosity result in efficient exploration,
especially both employed together. Notably, agents with hidden state curiosity
demonstrate resilience against curiosity traps, which hinder agents with
prediction error curiosity. This suggests implementing the FEP may enhance the
robustness and generalization of RL models, potentially aligning the learning
processes of artificial and biological agents.

在强化学习中，人工智能代理通过执行任务来最大化数值奖励，探索是至关重要的，因为代理必须在利用之前发现信息。熵和好奇心是促进有效探索的两种奖励方式。这篇论文基于自由能原理（FEP）提出了隐藏状态好奇心，并发现熵和好奇心可以实现高效探索，特别是两者结合。特别是，在好奇心陷阱方面，具有隐藏状态好奇心的代理展示出了韧性，而预测误差好奇心的代理则受到了干扰。这表明实施 FEP 可能增强强化学习模型的鲁棒性和泛化性，并潜在地调整人工和生物代理的学习过程。