temporal abstraction is key to scaling up learning and planning in
reinforcement learning. While planning with temporally extended actions is well
understood, creating such abstractions autonomously from data has
本文研究通过 Context-Specific Representation Abstraction for Deep Option Learning(CRADOL)框架学习因子化信念状态表示,以便于每个选项只学习状态空间的子集,从而减少策略空间搜索的规模,以提高分层强化学习(hierarchical reinforcement learning)中选项和行动的学习效率