延迟复合匿名反馈的随机赌博机

Oct, 2019

Stochastic Bandits with Delayed Composite Anonymous Feedback

Siddhant Garg, Aditya Kumar Akash

TL;DR探索一种新的多臂赌博问题（MAB）的设置，其中给出了随机延迟复合匿名反馈（SDCAF）的难点，使用基于UCB算法的相位扩展提出了两种算法，并通过遗憾分析显示出两种算法的亚线性理论保证。

Abstract

We explore a novel setting of the multi-armed bandit (MAB) problem inspired from real world applications which we call bandits with "stochastic delayed composite anonymous feedback (SDCAF)". In SDCAF, the rewards