BriefGPT.xyz
Dec, 2019
奖励条件下的策略
Reward-Conditioned Policies
HTML
PDF
Aviral Kumar, Xue Bin Peng, Sergey Levine
TL;DR
本篇论文旨在探讨利用非专家轨迹收集数据进行监督学习以实现行为策略的泛化,探讨了基于此原理进行的策略搜索的方法,并在标准基准测试中与多种强化学习方法进行了比较。
Abstract
reinforcement learning
offers the promise of automating the acquisition of complex behavioral skills. However, compared to commonly used and well-understood
supervised learning
methods,
→