BriefGPT.xyz
Dec, 2022
基于规约的平均回报MDP的近似最优策略学习
Near Sample-Optimal Reduction-based Policy Learning for Average Reward MDP
HTML
PDF
Jinghan Wang, Mengdi Wang, Lin F. Yang
TL;DR
本研究考虑采用生成模型(模拟器)以获取平均奖励 MDP 中的 eps 策略最优性的样本复杂度。
Abstract
This work considers the
sample complexity
of obtaining an $\varepsilon$-optimal policy in an
average reward
markov decision process
(AMDP)
→