汤普森抽样：渐进最优的有限时间分析

May, 2012

汤普森抽样：渐进最优的有限时间分析

Thompson Sampling: An Optimal Finite Time Analysis

Emilie Kaufmann, Nathaniel Korda, Rémi Munos

TL;DR本文针对伯努利回报情况，首次提供匹配 Lai 和 Robbins 下限所给累积遗憾率的有限时间分析，证明了 Thompson Sampling 是解决随机多臂老虎机问题的最优策略，并通过数值比较和实验验证了这一结论。

Abstract

The question of the optimality of thompson sampling for solving the stochastic multi-armed bandit problem had been open since 1933. In this paper we answer it positively for the case of →