BriefGPT.xyz
Feb, 2024
马尔可夫决策过程中的弱分布重叠下的离策略评估
Off-Policy Evaluation in Markov Decision Processes under Weak Distributional Overlap
HTML
PDF
Mohammad Mehrabi, Stefan Wager
TL;DR
在马尔可夫决策过程的顺序忽略性下,具有两重鲁棒性的方法在离线策略评估中具有良好的性能,通过引入一种截断两重鲁棒估计器,该方法能够在不满足强分布重叠假设的情况下实现准确的离线策略评估。
Abstract
doubly robust methods
hold considerable promise for
off-policy evaluation
in
markov decision processes
(MDPs) under sequential ignorabilit
→