Yijun Yang, Tianyi Zhou, Jing Jiang, Guodong Long, Yuhui Shi
TL;DR本文提出了Continual Task Allocation via Sparse Prompting(CoTASP)方法,通过在训练过程中学习过度完备的词典生成稀疏掩模,并通过不断优化互相调整以更新元策略,从而解决了强化学习中新任务训练效率低问题,最终在各项任务和泛化性上表现优异。
Abstract
How to train a generalizable meta-policy by continually learning a sequence of tasks? It is a natural human skill yet challenging to achieve by current reinforcement learning: the agent is expected to quickly ada