Many practical environments contain catastrophic states that an optimal agent
would visit infrequently or never. Even on toy problems, Deep Reinforcement
Learning (DRL) agents tend to periodically revisit these states upon forgetting
their existence under a new policy. We introduce intrinsic fear (IF), a learned
reward shaping that guards DRL agents against periodic catastrophes. IF agents
possess a fear model trained to predict the probability of imminent
catastrophe. This score is then used to penalize the Q-learning objective. Our
theoretical analysis bounds the reduction in average return due to learning on
the perturbed objective. We also prove robustness to classification errors. As
a bonus, IF models tend to learn faster, owing to reward shaping. Experiments
demonstrate that intrinsic-fear DQNs solve otherwise pathological environments
and improve on several Atari games.

该研究通过学习奖励塑造技术，引入内在的恐惧机制，保护深度强化学习代理人避免周期性的灾难状态，证明了其鲁棒性和学习速度优势，并在实验中成功解决了多种问题。

用内在的恐惧解决强化学习的西西弗斯诅咒

Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear

Advances in deep reinforcement learning have allowed autonomous agents to
perform well on Atari games, often outperforming humans, using only raw pixels
to make their decisions. However, most of these games take place in 2D
environments that are fully observable to the agent. In this paper, we present
the first architecture to tackle 3D environments in first-person shooter games,
that involve partially observable states. Typically, deep reinforcement
learning methods only utilize visual input for training. We present a method to
augment these models to exploit game feature information such as the presence
of enemies or items, during the training phase. Our model is trained to
simultaneously learn these features along with minimizing a Q-learning
objective, which is shown to dramatically improve the training speed and
performance of our agent. Our architecture is also modularized to allow
different models to be independently trained for different phases of the game.
We show that the proposed architecture substantially outperforms built-in AI
agents of the game as well as humans in deathmatch scenarios.

本研究提出了一种融合游戏特征信息的深度强化学习神经网络模型，其能够在处理 3D FPS 游戏的部分可观察状态下显著提高训练效率和性能。