BriefGPT.xyz
Oct, 2023
线性赌臂机器人的纳什后悔保证
Nash Regret Guarantees for Linear Bandits
HTML
PDF
Ayush Sawarni, Soumybrata Pal, Siddharth Barman
TL;DR
在随机线性赌博机的框架中,我们获得了强化的后悔概念的紧密上界。这个强化的后悔概念被称为Nash后悔,它被定义为线性赌博机算法累积的预期奖励的几何平均值与(事先未知的)最优解之间的差异。我们开发了一种算法,在有限的臂集和无限的臂集两种情况下,实现了Nash后悔的上界。
Abstract
We obtain essentially tight upper bounds for a strengthened notion of regret in the
stochastic linear bandits
framework. The strengthening -- referred to as
nash regret
-- is defined as the difference between the
→