BriefGPT.xyz
Mar, 2019
通过概率上下文变量实现高效的离线元强化学习
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
HTML
PDF
Kate Rakelly, Aurick Zhou, Deirdre Quillen, Chelsea Finn, Sergey Levine
TL;DR
本文提出了一种离线元强化学习算法,通过在线概率过滤隐含的任务变量来推断如何从少量经验中解决新任务,实现了结构化和有效的探索。该方法在几个元-强化学习基准测试中,比先前算法在样本效率和渐近性能方面提高了20-100倍。
Abstract
deep reinforcement learning
algorithms require large amounts of experience to learn an individual task. While in principle
meta-reinforcement learning
(meta-RL) algorithms enable agents to learn new skills from s
→