BriefGPT.xyz
May, 2017
熵正则化马尔科夫决策过程的统一视角
A unified view of entropy-regularized Markov decision processes
HTML
PDF
Gergely Neu, Anders Jonsson, Vicenç Gómez
TL;DR
提出一种针对Markov决策过程的熵正则化平均回报强化学习的一般性框架,通过使用条件熵来对联合状态-动作分布进行正则化,将一些先进的熵-正则化强化学习算法形式化为Mirror Descent或Dual Averaging的近似变体,并在简单的强化学习实验中展示了各种正则化技术对学习性能的影响。
Abstract
We propose a general framework for entropy-regularized average-reward
reinforcement learning
in Markov decision processes (MDPs). Our approach is based on extending the linear-programming formulation of
policy optimizat
→