BriefGPT.xyz
Jul, 2024
MDP几何、归一化和无价值解算器
MDP Geometry, Normalization and Value Free Solvers
HTML
PDF
Arsenii Mustafin, Aleksei Pakharev, Alex Olshevsky, Ioannis Ch. Paschalidis
TL;DR
本文介绍了对马尔可夫决策过程(MDP)的一种新的几何解释,该解释有助于分析主要MDP算法的动态特性。基于这种解释,我们证明了MDPs可以分成等价类,其算法动态性质难以区分。相关的标准化过程允许设计一类新的MDP求解算法,可以在不计算策略值的情况下找到最优策略。
Abstract
markov decision process
(MDP) is a common mathematical model for sequential decision-making problems. In this paper, we present a new
geometric interpretation
of MDP, which is useful for analyzing the dynamics of
→