We study the problem of Robust Outlier Arm Identification (ROAI), where the goal is to identify arms whose expected rewards deviate substantially from the majority, by adaptively sampling from their reward distributions. We compute the outlier threshold using the median and median absolute deviation of the expected rewards. This is a robust choice for the threshold compared to using the mean and standard deviation, since it can identify outlier arms even in the presence of extreme outlier values. Our setting is different from existing pure exploration problems where the threshold is pre-specified as a given value or rank. This is useful in applications where the goal is to identify the set of promising items but the cardinality of this set is unknown, such as finding promising drugs for a new disease or identifying items favored by a population. We propose two $\delta$-PAC algorithms for ROAI, which includes the first UCB-style algorithm for outlier detection, and derive upper bounds on their sample complexity. We also prove a matching, up to logarithmic factors, worst case lower bound for the problem, indicating that our upper bounds are generally unimprovable. Experimental results show that our algorithms are both robust and about $5$x sample efficient compared to state-of-the-art.

本文研究的是稳健异常臂识别问题，旨在通过对其奖励分布进行自适应抽样以识别奖励期望值与大多数值存在明显差异的臂，采用中位数和中位数绝对偏差计算异常值阈值是选择与平均值和标准偏差相比更为稳健的阈值方法，我们建议两个Δ-PAC算法用于ROAI，其包括第一种基于UCB的异常检测算法，并导出了它们的样本复杂度的上限。我们还证明了最坏情况下的下限，表明我们的上限通常无法改进。实验结果表明，与最先进的方法相比，我们的算法既稳健又更加高效。

鲁棒异常值臂识别