Dueling Bandit问题的遗憾下限和最优算法

Jun, 2015

Dueling Bandit问题的遗憾下限和最优算法

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa

TL;DR本文研究了K-armed dueling bandit问题，提出了一种受Deterministic Minimum Empirical Divergence算法启发的算法，并得到了匹配下界的后悔上界，实验结果表明该算法明显优于现有算法。

Abstract

We study the $K$-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the →