BriefGPT.xyz
Nov, 2017
连续对抗波段的遗憾分析
Regret Analysis for Continuous Dueling Bandit
HTML
PDF
Wataru Kumagai
TL;DR
这篇研究论文提出了一个基于连续空间的成本函数的对决Bandit问题解决方案,介绍了一种随机镜像下降算法,并表明该算法在成本函数的强凸和平滑假设下实现了O(sqrt(T log T))的遗憾界。此外,它还探讨了对决Bandit问题遗憾最小化与成本函数凸优化的等价性。
Abstract
The
dueling bandit
is a
learning framework
wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions. In this research, we address a
→