BriefGPT.xyz
Dec, 2021
带汇总反馈的高斯过程赌博机
Gaussian Process Bandits with Aggregated Feedback
HTML
PDF
Mengyan Zhang, Russell Tsuchida, Cheng Soon Ong
TL;DR
提出了用高斯过程进行连续赌博机问题的最优解优化算法,适用于在固定预算下推荐最佳赌博机并获取平均回报。在提供集合平均值等聚合反馈而精确奖励成本较高或不可能时使用,通过高斯过程约束奖励函数集合,并在节点之间自适应构建树状结构。
Abstract
We consider the
continuum-armed bandits
problem, under a novel setting of recommending the best arms within a fixed budget under
aggregated feedback
. This is motivated by applications where the precise rewards ar
→