BriefGPT.xyz
Feb, 2023
针对指数代价风险敏感 MDP 的修改策略迭代算法
Modified Policy Iteration for Exponential Cost Risk Sensitive MDPs
HTML
PDF
Yashaswini Murthy, Mehrdad Moharrami, R. Srikant
TL;DR
本文针对指数成本的风险敏感MDP问题,首次提供了MPI在有限状态和动作空间中收敛的证明,其收敛证明与已有的折扣和风险中性平均费用问题不同,也提供了风险敏感MDP的近似MPI证明。
Abstract
modified policy iteration
(MPI) also known as optimistic policy iteration is at the core of many reinforcement learning algorithms. It works by combining elements of policy iteration and
value iteration
. The conv
→