BriefGPT.xyz
Oct, 2019
延迟复合匿名反馈的随机赌博机
Stochastic Bandits with Delayed Composite Anonymous Feedback
HTML
PDF
Siddhant Garg, Aditya Kumar Akash
TL;DR
探索一种新的多臂赌博问题(MAB)的设置,其中给出了随机延迟复合匿名反馈(SDCAF)的难点,使用基于UCB算法的相位扩展提出了两种算法,并通过遗憾分析显示出两种算法的亚线性理论保证。
Abstract
We explore a novel setting of the
multi-armed bandit
(MAB) problem inspired from real world applications which we call bandits with "
stochastic delayed composite anonymous feedback
(SDCAF)". In SDCAF, the rewards
→