We provide an algorithm that achieves the optimal (up to constants) finite time regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The result provides a negative answer to the open problem of whether extra price has to be paid for the lack of information about the adversariality/stochasticity of the environment. We provide a complete characterization of online mirror descent algorithms based on Tsallis entropy and show that the power ${\alpha} = \frac{1}{2}$ achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins (2014) and the stochastically constrained adversary studied by Wei and Luo (2018). The algorithm also obtains adversarial and stochastic optimality in the utility-based dueling bandit setting.

通过在线镜像下降（OMD）算法与 Tsallis 熵正则化之间的结合，本论文提出了一种能够同时在对抗场景与随机场景下带来最优伪后悔值的算法，其具有自限制约束下的对抗场景、随机有界对抗场景以及受敌方攻击污染的随机场景等多种通用性，且能在这些场景下同时保证对抗后悔保证和对数后悔保证；该算法同时能够实现证券交易中的对抗最优化和随机最优化，并且在实际测试中表现出了极高的鲁棒性和性能优势。

Tsallis-INF：用于随机和对抗赌徒的最优算法