BriefGPT.xyz
Jan, 2016
用于对抗性基于效用的决斗多臂赌博机问题的相对指数加权算法
A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits
HTML
PDF
Pratik Gajane, Tanguy Urvoy, Fabrice Clérot
TL;DR
提出了REX3算法来解决多臂对决问题中对于选择一对臂进行相对反馈而不是绝对反馈的问题,算法具有O(sqrt(K ln(K)T))的期望有限时间遗憾上界,同时提供了从信息检索应用程序中使用真实数据的实验结果。
Abstract
We study the K-armed
dueling bandit problem
which is a variation of the classical
multi-armed bandit
(MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose
→