Yuxiang Yang, Ken Caluwaerts, Atil Iscen, Jie Tan, Chelsea Finn
TL;DR文章提出了一种名为No-Reward Meta Learning (NoRML)的自适应学习方法,它使用观测到的环境动态而不是显式奖励函数进行模型参数更新,以适应目标任务时变的环境动态。研究表明,NoRML在环境动态变化时的性能优于传统方法Model Agnostic Meta Learning (MAML)。
Abstract
Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. reinforcement learning (RL) based approaches typically rely on external reward feedback for →