BriefGPT.xyz
Oct, 2023
马尔可夫决策过程中的超越平均回报
Beyond Average Return in Markov Decision Processes
HTML
PDF
Alexandre Marthe, Aurélien Garivier, Claire Vernade
TL;DR
马尔可夫决策过程中,奖励的功能有哪些可以精确计算和优化?我们总结了策略评估相关类的特性,给出了规划问题的新解答。同时,我们证明了只有广义平均数能够被精确优化,即使在分布式强化学习的更通用框架下也是如此。这些结果为推进马尔可夫决策过程的理论发展做出了贡献,尤其关注回报的整体特征和风险感知策略。
Abstract
What are the functionals of the reward that can be computed and optimized exactly in
markov decision processes
? In the finite-horizon, undiscounted setting,
dynamic programming
(DP) can only handle these operatio
→