TL;DR该研究探讨了在对抗性破坏下的 K 臂线性上下文赌博问题,并提出了一种在随机和对抗环境下具有理论保证的名为最佳两全(BoBW) RealFTRL 的策略。
Abstract
This study investigates the problem of $K$-armed linear contextual bandits,
an instance of the multi-armed bandit problem, under an adversarial corruption.
At each round, a decision-maker observes an independent and identically
distributed context and then selects an arm based on the c