BriefGPT.xyz
Apr, 2023
具有量化目标的随机博弈价值迭代的停止准则
Stopping Criteria for Value Iteration on Stochastic Games with Quantitative Objectives
HTML
PDF
Jan Křetínský, Tobias Meggendorfer, Maximilian Weininger
TL;DR
本文提出了应用于马尔可夫决策过程和随机游戏的价值迭代算法的停止准则,这是该领域首个用于计算总体回报和平均回报的任何时刻算法。我们的方法通过将问题降低到马尔可夫决策过程领域和直接应用于随机游戏领域中,统一了先前的算法并提出了目标独立的概念。
Abstract
A classic solution technique for Markov decision processes (MDP) and
stochastic games
(SG) is
value iteration
(VI). Due to its good practical performance, this approximative approach is typically preferred over e
→