Deep reinforcement learning suffers from catastrophic forgetting and sample
inefficiency making it less applicable to the ever-changing real world.
However, the ability to use previously learned knowledge is essential for AI
agents to quickly adapt to novelties. Often, certain spatial information
observed by the agent in the previous interactions can be leveraged to infer
task-specific rules. Inferred rules can then help the agent to avoid
potentially dangerous situations in the previously unseen states and guide the
learning process increasing agent's novelty adaptation speed. In this work, we
propose a general framework that is applicable to deep reinforcement learning
agents. Our framework provides the agent with an autonomous way to discover the
task-specific rules in the novel environments and self-supervise it's learning.
We provide a rule-driven deep Q-learning agent (RDQ) as one possible
implementation of that framework. We show that RDQ successfully extracts
task-specific rules as it interacts with the world and uses them to drastically
increase its learning efficiency. In our experiments, we show that the RDQ
agent is significantly more resilient to the novelties than the baseline
agents, and is able to detect and adapt to novel situations faster.

深度强化学习的关键问题包括忘记和样本效率低下，该研究通过发现并利用空间信息推导任务特定规则，提出了一个通用框架来帮助智能体在新环境中自主学习并增加适应速度。该框架的实现之一是基于规则驱动的深度 Q 学习代理，它在实验中表现出明显更强的抵抗新鲜事物和适应新情况的能力。