BriefGPT.xyz
Sep, 2018
重要性采样的策略优化
Policy Optimization via Importance Sampling
HTML
PDF
Alberto Maria Metelli, Matteo Papini, Francesco Faccio, Marcello Restelli
TL;DR
本文提出一种新的,无模型的策略搜索算法,POIS,它适用于基于动作和基于参数的设置,可在连续控制任务中有效地解决强化学习问题,通过离线优化新的轨迹批次来定义一个替代目标函数,并使用高置信度界限来解决估计的目标函数方差问题。
Abstract
policy optimization
is an effective
reinforcement learning
approach to solve
continuous control tasks
. Recent achievements have shown that
→