BriefGPT.xyz
May, 2024
连续时间与空间中的策略镜像下降熵退火
Entropy annealing for policy mirror descent in continuous time and space
HTML
PDF
Deven Sethi, David Šiška, Yufei Zhang
TL;DR
熵正则化在政策优化中被广泛使用,有助于优化收敛,本文通过分析连续时间政策镜像下降动态,证明了固定熵水平下的动态指数级收敛到正则化问题的最优解,并通过调整熵正则化的衰减速率得出在离散和一般动作空间中的收敛速率。
Abstract
entropy regularization
has been extensively used in
policy optimization
algorithms to regularize the optimization landscape and accelerate
conver
→