风险规避的随机凸臂老虎机

Oct, 2018

Risk-Averse Stochastic Convex Bandit

Adrian Rivera Cardoso, Huan Xu

TL;DR本文研究了在线凸优化的问题，在该问题中，决策者是风险规避的。我们提供了两个算法来解决这个问题。第一个是降落算法，易于实现。第二个算法结合了椭圆体方法和中心点装置，对于回合数实现了（几乎）最优的后悔界限。据我们所知，这是在在线凸博弈问题中首次尝试解决风险规避问题。

Abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this