BriefGPT.xyz
Jul, 2019
合作非随机多臂老虎机中的个体遗憾
Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
HTML
PDF
Yogev Bar-On, Yishay Mansour
TL;DR
研究通过交换信息在底层网络上通信的代理,以优化共同的非随机多臂赌博问题中各自的遗憾。我们推导出遗憾最小化算法,其中保证每个代理v的期望遗憾都是(1+K/|N(v)|)^T的平方根量级。
Abstract
We study agents communicating over an underlying network by exchanging messages, in order to optimize their individual regret in a common nonstochastic
multi-armed bandit
problem. We derive
regret minimization
al
→