Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, Aleksandrs Slivkins
TL;DR探究了一种带背包的 Bandits 模型,旨在在限制供应 / 预算情况下求解多臂赌博机问题。提出了一种新的算法,采用重复博弈中遗憾最小化的框架,相对于最佳固定动作分布具有 O (log T) 的竞争比率。
Abstract
We consider bandits with knapsacks (henceforth, BwK), a general model for
multi-armed bandits under supply/budget constraints. In particular, a bandit
algorithm needs to solve a well-known knapsack problem: find