通过归一化流策略提升信赖域策略优化

Sep, 2018

通过归一化流策略提升信赖域策略优化

Boosting Trust Region Policy Optimization by Normalizing Flows Policy

Yunhao Tang, Shipra Agrawal

TL;DR本文提出使用基于归一化流的策略改进了信任域策略搜索方法，通过KL散度约束构建信任域并采用归一化流的策略从以前的策略中心出发引入新的样本，有助于更好的探索和避免局部最优解，在高维度和复杂动力学任务中取得了优异的比较结果。

Abstract

We propose to improve trust region policy search with normalizing flows policy. We illustrate that when the trust region is constructed by KL divergence constraint, →