BriefGPT.xyz
Jan, 2022
关于连续动作空间中策略镜像上升的隐藏偏差
On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces
HTML
PDF
Amrit Singh Bedi, Souradip Chakraborty, Anjaly Parayil, Brian Sadler, Pratap Tokekar...
TL;DR
本文针对连续动作空间下的强化学习问题,提出一种基于重尾分布参数化的策略梯度算法,并对该算法进行了理论和实验研究,表明该算法相比于标准基准在多种场景下都能得到改进的奖励累积结果。
Abstract
We focus on
parameterized policy search
for
reinforcement learning
over
continuous action spaces
. Typically, one assumes the score functio
→