Caleb Chuck, Kevin Black, Aditya Arjun, Yuke Zhu, Scott Niekum
TL;DR介绍了一种新算法 Hierarchy of Interaction Skills(HIntS),利用 Granger causality 无监督地发现和使用交互探测器训练层次化的技能,解决了强化学习中样本效率低和泛化问题。在机器人推动障碍物任务中,可以将学习到的技能应用到其他相关任务中,并在效率和性能方面显著提高。
Abstract
reinforcement learning (RL) has shown promising results learning policies for complex tasks, but can often suffer from low sample efficiency and limited →