TL;DR该论文提出了一种名为CURIOUS的算法,它利用MODULAR Universal Value Function Approximator和自动化课程学习机制来实现学习代理的自主目标设定和自我组织学习课程,实现学习目标的快速最优化。
Abstract
In open-ended and changing environments, agents face a wide range of potential tasks that may or may not come with associated reward functions. Such autonomous learning agents must be able to generate their own tasks through a process of →