BriefGPT.xyz
Feb, 2015
信任域策略优化
Trust Region Policy Optimization
HTML
PDF
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
TL;DR
本文提出了一种名为TRPO的实用算法,通过优化政策来达到保证单调改善的目的,并通过一系列实验展示了其在学习模拟机器人的Swimming、Hopping以及Walking,并使用屏幕图像玩Atari游戏等众多方面的优越表现。
Abstract
We propose a family of
trust region
policy optimization
(TRPO) algorithms for learning control policies. We first develop a policy update scheme with guaranteed monotonic improvement, and then we describe a finit
→