We investigate the online bandit learning of the monotone multi-linear DR-submodular functions, designing the algorithm $\mathtt{BanditMLSM}$ that attains $O(T^{2/3}\log T)$ of $(1-1/e)$-regret. Then we reduce submodular bandit with partition matroid constraint and bandit sequential monotone maximization to the online bandit learning of the monotone multi-linear DR-submodular functions, attaining $O(T^{2/3}\log T)$ of $(1-1/e)$-regret in both problems, which improve the existing results. To the best of our knowledge, we are the first to give a sublinear regret algorithm for the submodular bandit with partition matroid constraint. A special case of this problem is studied by Streeter et al.(2009). They prove a $O(T^{4/5})$ $(1-1/e)$-regret upper bound. For the bandit sequential submodular maximization, the existing work proves an $O(T^{2/3})$ regret with a suboptimal $1/2$ approximation ratio (Niazadeh et al. 2021).

研究在线赌徒学习中的单调多线性DR-子模函数设计算法BanditMLSM，可以获得（1-1/e）遗憾的O（T ^ {2/3} log T）；将子模随机带入分割拟阵约束和赌徒顺序单调最大化，可以在两个问题中获得O（T ^ {2/3} log T）的（1-1 / e）遗憾，这比现有结果更好。给出第一个关于具有分割拟阵约束的子模赌徒的次线性遗憾算法。

基于多线性DR-次模极大化的Bandit算法及其在对抗性次模Bandit中的应用