In this work we present ISA, a novel approach for learning and exploiting subgoals in reinforcement learning (RL). Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.

本文提出ISA，使用归纳逻辑程序设计帮助学习强化学习中的子目标，建立了可自我更新的自动机模型。实验证明，通过使用学习到的自动机进一步增强奖励调整和多任务迁移学习，ISA能够在多种格局提供同样优秀的表现且对可观察事件数量的影响进行了分析。

为强化学习引入子目标自动机