BriefGPT.xyz
Jul, 2020
通过正则化拉格朗日算子进行离策略评估
Off-Policy Evaluation via the Regularized Lagrangian
HTML
PDF
Mengjiao Yang, Ofir Nachum, Bo Dai, Lihong Li, Dale Schuurmans
TL;DR
通过将distribution correction estimation (DICE)家族的估计器作为相同线性规划的正则化拉格朗日乘子统一起来,我们扩展了DICE估计器的空间到新的替代方案,分析了估计器的扩展空间,发现双重解决方案在优化稳定性和估计偏差之间的权衡方面提供了更大的灵活性,并在实践中通常提供更好的估计。
Abstract
The recently proposed
distribution correction
estimation (DICE) family of
estimators
has advanced the state of the art in
off-policy evaluation
→