BriefGPT.xyz
Nov, 2011
多臂赌博问题中汤普森采样的分析
Analysis of Thompson Sampling for the multi-armed bandit problem
HTML
PDF
Shipra Agrawal, Navin Goyal
TL;DR
本文介绍了使用贝叶斯算法的 Thompson Sampling 原则,旨在在序贯决策问题中研究探索/开发权衡。该算法在实验证明接近最优,并展现了一些理想的特性,但对该算法的理论认识相当有限。本文第一次展示了 Thompson Sampling 算法在多臂赌博机问题中实现了对数级别的预期遗憾。
Abstract
The
multi-armed bandit problem
is a popular model for studying exploration/exploitation trade-off in
sequential decision
problems. Many algorithms are now available for this well-studied problem. One of the earli
→