BriefGPT.xyz
May, 2016
无模型模仿学习与策略优化
Model-Free Imitation Learning with Policy Optimization
HTML
PDF
Jonathan Ho, Jayesh K. Gupta, Stefano Ermon
TL;DR
在模仿学习中,我们使用基于样本的方法开发了一种基于策略梯度的算法,即通过学习专家的样本轨迹,找到至少与专家策略一样好的参数化随机策略;该算法可以应用于高维度环境,并保证收敛到局部最小值。
Abstract
In
imitation learning
, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing
imitation learning
algorithms typically involve solving a sequence
→