BriefGPT.xyz
Aug, 2024
通过信息松弛改进预算多臂赌博机中的汤普森采样
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits
HTML
PDF
Woojin Jeong, Seungki Min
TL;DR
本文针对预算多臂赌博机问题,提出了改进的汤普森采样方法以解决资源预算限制带来的选择不足。通过采用信息松弛采样框架,该研究提出了一系列随机算法,更加优化了决策过程,对比传统基准也得到了显著的改进。理论分析和模拟结果表明,所提算法在多个场景中均优于预算汤普森采样,展现了良好的应用前景。
Abstract
We consider a Bayesian budgeted multi-armed bandit problem, in which each arm consumes a different amount of resources when selected and there is a
Budget Constraint
on the total amount of resources that can be used. Budgeted
→