基于随机贪心学习的非单调随机次模最大化全博弈反馈

Feb, 2023

Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback

Fares Fourati, Vaneet Aggarwal, Christopher John Quinn, Mohamed-Slim Alouini

TL;DR本文研究具有完全机器人反馈和随机奖励的无限制组合多臂武器匪徒问题，提出随机贪心学习算法(RGL)，证明其对于时间区间T和武器数n，达到1/2遗憾上限Õ(T^(2/3))，并在实验中展示了其对于非次模和次模设置都优于其他全机器人变体。

Abstract

We investigate the problem of unconstrained combinatorial multi-armed bandits with full-bandit feedback and stochastic rewards for submodular max