Abbas Kazerouni, Mohammad Ghavamzadeh, Benjamin Van Roy
TL;DR研究在基于上下文的线性多臂赌博机问题中的安全问题,提出了Conservative Linear UCB (CLUCB)算法,保证了安全性的同时,最小化了它的遗憾,并将其维持在一个固定的性能百分比之上。
safety is a desirable property that can immensely increase the applicability of learning algorithms in real-world decision-making problems. It is much easier for a company to deploy an algorithm that is safe, i.e.,~guaranteed to perform at least as well as a baseline. In this paper, we