随机和非随机多臂赌博机问题的遗憾分析

Apr, 2012

随机和非随机多臂赌博机问题的遗憾分析

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

Sébastien Bubeck, Nicolò Cesa-Bianchi

TL;DR本调查报告主要关注于多臂赌博问题中两个极端情况的分析，即独立同分布回报和对抗性回报，并对有限行为、情境赌博模型等进行了分析。

Abstract

multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in