BriefGPT.xyz
Oct, 2022
在赌博机中通过草图进行奖励插值,使部分信息成为全面信息
Partial Information as Full: Reward Imputation with Sketching in Bandits
HTML
PDF
Xiao Zhang, Ninglu Shao, Zihua Si, Jun Xu, Wenha Wang...
TL;DR
本文提出了一种针对上下文批处理赌博机问题提升反馈信息利用率的奖赏补偿方法,该方法利用随机草图求解回归问题完成了未观测到奖赏的预测,以实现对完整反馈信息的近似,具有可控偏差和更小的方差,并在合成和现实数据集上优于现有方法。
Abstract
We focus on the setting of
contextual batched bandit
(CBB), where a batch of rewards is observed from the environment in each episode. But the rewards of the non-executed actions are unobserved (i.e.,
partial-informatio
→