BriefGPT.xyz
Apr, 2012
随机和非随机多臂赌博机问题的遗憾分析
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
HTML
PDF
Sébastien Bubeck, Nicolò Cesa-Bianchi
TL;DR
本调查报告主要关注于多臂赌博问题中两个极端情况的分析,即独立同分布回报和对抗性回报,并对有限行为、情境赌博模型等进行了分析。
Abstract
multi-armed bandit problems
are the most basic examples of sequential decision problems with an
exploration-exploitation trade-off
. This is the balance between staying with the option that gave highest payoffs in
→