带熵正则化的约束马尔可夫决策过程的双重方法

Oct, 2021

带熵正则化的约束马尔可夫决策过程的双重方法

A Dual Approach to Constrained Markov Decision Processes with Entropy Regularization

Donghao Ying, Yuhao Ding, Javad Lavaei

TL;DR研究了采用软最大化参数化的熵正则化约束马尔可夫决策过程及其Lagrange对偶函数和约束违规等问题。并提出了加速对偶下降方法以实现全局收敛性。

Abstract

We study entropy-regularized constrained markov decision processes (CMDPs) under the soft-max parameterization, in which an agent aims to maximize the →