BriefGPT.xyz
Sep, 2018
通过归一化流策略提升信赖域策略优化
Boosting Trust Region Policy Optimization by Normalizing Flows Policy
HTML
PDF
Yunhao Tang, Shipra Agrawal
TL;DR
本文提出使用基于归一化流的策略改进了信任域策略搜索方法,通过KL散度约束构建信任域并采用归一化流的策略从以前的策略中心出发引入新的样本,有助于更好的探索和避免局部最优解,在高维度和复杂动力学任务中取得了优异的比较结果。
Abstract
We propose to improve
trust region policy search
with
normalizing flows policy
. We illustrate that when the trust region is constructed by KL divergence constraint,
→