BriefGPT.xyz
Jun, 2023
离线赌博机中基于贝叶斯遗憾最小化的凸松弛方法
A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits
HTML
PDF
Mohammad Ghavamzadeh, Marek Petrik, Guy Tennenholtz
TL;DR
本文提出了一种利用高效对偶锥优化器,直接最小化贝叶斯遗憾的上界以及与VaR和机遇约束优化之间的关系建立的边界的新方法来优化不确定环境下离线数据的决策问题,并与现有算法进行了比较。
Abstract
Algorithms for
offline bandits
must optimize decisions in uncertain environments using only offline data. A compelling and increasingly popular objective in
offline bandits
is to learn a policy which achieves low
→