BriefGPT.xyz
Oct, 2022
超越回报:基于用户指定的误差测量分布的离线策略评估
Beyond the Return: Off-policy Function Estimation under User-specified Error-measuring Distributions
HTML
PDF
Audrey Huang, Nan Jiang
TL;DR
本文提供了在可行性前提下,通过在MIS目标上施加适当的规范化对离线策略函数估计提供保证,并提供了优化对偶解的确切特征化方法,该解决方案需要由鉴别器类实现,这决定了在值函数学习的情况下数据覆盖假设。
Abstract
off-policy evaluation
often refers to two related tasks: estimating the
expected return
of a policy and estimating its
value function
(or
→