连续对抗波段的遗憾分析

Nov, 2017

Regret Analysis for Continuous Dueling Bandit

Wataru Kumagai

TL;DR这篇研究论文提出了一个基于连续空间的成本函数的对决Bandit问题解决方案，介绍了一种随机镜像下降算法，并表明该算法在成本函数的强凸和平滑假设下实现了O(sqrt(T log T))的遗憾界。此外，它还探讨了对决Bandit问题遗憾最小化与成本函数凸优化的等价性。

Abstract

The dueling bandit is a learning framework wherein the feedback information in the learning process is restricted to a noisy comparison between a pair of actions. In this research, we address a →