BriefGPT.xyz
May, 2023
最大价值-指数反馈下的组合赌博机最大价值奖励函数
Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback
HTML
PDF
Yiliu Wang, Wei Chen, Milan Vojnović
TL;DR
研究了在最大价值和指标反馈下的组合多臂赌博问题,并提出一种算法来保证概率有限支持中随机手臂结果的遗憾。
Abstract
We consider a
combinatorial multi-armed bandit
problem for
maximum value
reward function under
maximum value
and
→