In this paper, we consider the setting of piecewise i.i.d. bandits under a safety constraint. In this piecewise i.i.d. setting, there exists a finite number of changepoints where the mean of some or all arms change simultaneously. We introduce the safety constraint studied in \citet{wu2016conservative} to this setting such that at any round the cumulative reward is above a constant factor of the default action reward. We propose two actively adaptive algorithms for this setting that satisfy the safety constraint, detect changepoints, and restart without the knowledge of the number of changepoints or their locations. We provide regret bounds for our algorithms and show that the bounds are comparable to their counterparts from the safe bandit and piecewise i.i.d. bandit literature. We also provide the first matching lower bounds for this setting. Empirically, we show that our safety-aware algorithms perform similarly to the state-of-the-art actively adaptive algorithms that do not satisfy the safety constraint.

本文考虑在安全约束下，针对分段独立同分布赌博机的问题，引入了适应性算法，探测并重新开始实验，同时提供了相应的遗憾上界和匹配下界。实验表明，相较于不符合安全约束的算法，本文提出的带安全约束的算法性能相似。

基于安全性的分段独立同分布赌博机变点检测