BriefGPT.xyz
Jul, 2023
关于插值专家和多臂赌博机的研究
On Interpolating Experts and Multi-Armed Bandits
HTML
PDF
Houshuang Chen, Yuchen He, Chihao Zhang
TL;DR
研究一种插值两种不同信息观察方式的在线决策问题,称为$\mathbf{m}$-MAB。施加$\mathbf{m}$-MAB的紧凑极小后悔界,并为其纯探索版本$\mathbf{m}$-BAI设计了最佳PAC算法。本文还将$\mathbf{m}$-MAB的上限和下限扩展到了更一般的带有图反馈的情景下,并得出了在几个反馈图族中获得紧凑极小后悔界的结果。
Abstract
Learning with expert advice and
multi-armed bandit
are two classic
online decision
problems which differ on how the information is observed in each round of the game. We study a family of problems interpolating t
→