BriefGPT.xyz
Mar, 2011
双重稳健策略评估与学习
Doubly Robust Policy Evaluation and Learning
HTML
PDF
Miroslav Dudik, John Langford, Lihong Li
TL;DR
在具有上下文情境和目标函数的决策环境中,我们使用双重稳健技术评估新策略,并证明这种方法使价值估计具有较低的方差,且能达到更好的策略,为该领域提供一种有效的方法。
Abstract
We study
decision making
in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as
contextual bandits
, encompasses
→