BriefGPT.xyz
Jul, 2023
带预测内容的在线强盗学习
Online learning in bandits with predicted context
HTML
PDF
Yongyi Guo, Susan Murphy
TL;DR
我们考虑了上下文强盗问题,在每个时间点上,代理只能访问上下文的嘈杂版本和误差方差(或该方差的估计)。我们提出了第一个在线算法,与适当的基准相比,在此设置中具有亚线性遗憾,其关键思想是将经典统计中的测量误差模型延伸到在线决策情境中,这是一个非常复杂的问题,因为策略依赖于嘈杂的上下文观察。
Abstract
We consider the
contextual bandit problem
where at each time, the agent only has access to a noisy version of the context and the
error variance
(or an estimator of this variance). This setting is motivated by a
→