带预测内容的在线强盗学习

Jul, 2023

Online learning in bandits with predicted context

Yongyi Guo, Susan Murphy

TL;DR我们考虑了上下文强盗问题，在每个时间点上，代理只能访问上下文的嘈杂版本和误差方差（或该方差的估计）。我们提出了第一个在线算法，与适当的基准相比，在此设置中具有亚线性遗憾，其关键思想是将经典统计中的测量误差模型延伸到在线决策情境中，这是一个非常复杂的问题，因为策略依赖于嘈杂的上下文观察。

Abstract

We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a