We consider learning to maximize reward in combinatorial cascading bandits, a new learning setting that unifies cascading and combinatorial bandits. The unification of these frameworks presents unique challenges in the analysis but allows for modeling a rich set of partial monitoring problems, such as learning to route in a communication network to minimize the probability of losing routed packets and recommending diverse items. We propose CombCascade, a computationally-efficient UCB-like algorithm for solving our problem; and derive gap-dependent and gap-free upper bounds on its regret. Our analysis builds on recent results in stochastic combinatorial semi-bandits but also addresses two novel challenges of our learning setting, a non-linear objective and partial observability. We evaluate CombCascade on two real-world problems and demonstrate that it performs well even when our modeling assumptions are violated. We also demonstrate that our setting requires new learning algorithms.

提出了组合级联赌博算法，对分布随机的约束问题解决一类非线性奖励函数部分可观测性问题，提供了一种基于UCB算法的求解方法，并论证了与时间复杂度无关的期望损失界限和时间关联的损失上限。在两个真实世界的网络路径问题测试中，算法表现良好，说明该算法对于模型假设违反的情况同样稳健有效，这个设置还需要提出新的学习算法。

组合级联赌博机