BriefGPT.xyz
Sep, 2023
通过频率正规化解决非矩形奖励鲁棒MDPs
Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization
HTML
PDF
Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy...
TL;DR
研究强健的马尔可夫决策过程中的关键问题,如不确定性集合、计算可行性以及策略访问频率正则化方法,并引入一种收敛的策略梯度方法进行分析。
Abstract
In
robust markov decision processes
(RMDPs), it is assumed that the reward and the transition dynamics lie in a given
uncertainty set
. By targeting maximal return under the most adversarial model from that set, R
→