In this paper, we study the problem of estimating uniformly well the mean
values of several distributions given a finite budget of samples. If the
variance of the distributions were known, one could design an optimal sampling
strategy by collecting a number of independent samples per distribution that is
proportional to their variance. However, in the more realistic case where the
distributions are not known in advance, one needs to design adaptive sampling
strategies in order to select which distribution to sample from according to
the previously observed samples. We describe two strategies based on pulling
the distributions a number of times that is proportional to a high-probability
upper-confidence-bound on their variance (built from previous observed samples)
and report a finite-sample performance analysis on the excess estimation error
compared to the optimal allocation. We show that the performance of these
allocation strategies depends not only on the variances but also on the full
shape of the distributions.

本文主要研究的问题是：如何在样本预算有限的情况下，统一地估计多个分布的平均值。通过采集数量，可以根据它们的方差为已知来设计最优的采样策略，但在更实际的情况下，需要设计自适应采样策略来选择要采样的分布（根据先前观察到的样本）。文章描述了两种策略，根据样本数据以高概率上限置信界为比例，拉动分布并报告相对于最优配置的过度估计误差的有限样本性能分析。我们表明这些分配策略的性能不仅取决于方差还取决于分布的完整形状。

多臂赌博机中主动学习的上置信界算法

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed  Bandits

The maximum a-posteriori (MAP) perturbation framework has emerged as a useful
approach for inference and learning in high dimensional complex models. By
maximizing a randomly perturbed potential function, MAP perturbations generate
unbiased samples from the Gibbs distribution. Unfortunately, the computational
cost of generating so many high-dimensional random variables can be
prohibitive. More efficient algorithms use sequential sampling strategies based
on the expected value of low dimensional MAP perturbations. This paper develops
new measure concentration inequalities that bound the number of samples needed
to estimate such expected values. Applying the general result to MAP
perturbations can yield a more efficient algorithm to approximate sampling from
the Gibbs distribution. The measure concentration result is of general interest
and may be applicable to other areas involving expected estimations.

本文提出了新的度量区间不等式方法，用于估算低维度 MAP 扰动期望值所需的样本数量，通过将该通用结果应用于 MAP 扰动，可以产生更有效的算法以从 Gibbs 分布中近似采样。

关于随机最大后验扰动度量的专业简体中文翻译

On Measure Concentration of Random Maximum A-Posteriori Perturbations

The stochastic multi-armed bandit problem is well understood when the reward
distributions are sub-Gaussian. In this paper we examine the bandit problem
under the weaker assumption that the distributions have moments of order
1+\epsilon, for some $\epsilon \in (0,1]$. Surprisingly, moments of order 2
(i.e., finite variance) are sufficient to obtain regret bounds of the same
order as under sub-Gaussian reward distributions. In order to achieve such
regret, we define sampling strategies based on refined estimators of the mean
such as the truncated empirical mean, Catoni's M-estimator, and the
median-of-means estimator. We also derive matching lower bounds that also show
that the best achievable regret deteriorates when \epsilon <1.

本文考察了当奖励分布具有 1+ε 阶矩时的多臂赌博问题，通过定义基于更精细的估计器的采样策略，如截断经验均值、Catoni 的 M - 估计和均值中位数估计器，证明了二阶矩（有限方差）足以获得与次高斯奖励分布同阶的悔恨界。