In this study, we consider the infinitely many armed bandit problems in
rotting environments, where the mean reward of an arm may decrease with each
pull, while otherwise, it remains unchanged. We explore two scenarios capturing
problem-dependent characteristics regarding the decay of rewards: one in which
the cumulative amount of rotting is bounded by $V_T$, referred to as the
slow-rotting scenario, and the other in which the number of rotting instances
is bounded by $S_T$, referred to as the abrupt-rotting scenario. To address the
challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB
with an adaptive sliding window, designed to manage the bias and variance
trade-off arising due to rotting rewards. Our proposed algorithm achieves tight
regret bounds for both slow and abrupt rotting scenarios. Lastly, we
demonstrate the performance of our algorithms using synthetic datasets.

在本研究中，我们考虑了在腐败环境中的无限多臂赌博问题，其中每个臂的平均奖励可能会在每次拉动后减少，而其他情况下保持不变。我们探讨了两种场景，捕捉到关于奖励衰减的问题相关特征：一个情景中腐败的累积量受到 $V_T$ 的限制，称为缓慢衰败的场景，另一个情景中腐败次数受到 $S_T$ 的限制，称为突然衰败的场景。为了应对腐败奖励带来的挑战，我们引入了一种算法，利用自适应滑动窗口的 UCB，旨在管理由于腐败奖励引起的偏差和方差权衡。我们提出的算法对于缓慢衰败和突然衰败的场景都取得了较紧的遗憾界。最后，我们使用合成数据集演示了我们算法的性能。