We introduce the factored bandits model, which is a framework for learning with limited (bandit) feedback, where actions can be decomposed into a Cartesian product of atomic actions. Factored bandits incorporate rank-1 bandits as a special case, but significantly relax the assumptions on the form of the reward function. We provide an anytime algorithm for stochastic factored bandits and up to constants matching upper and lower regret bounds for the problem. Furthermore, we show that with a slight modification the proposed algorithm can be applied to utility based dueling bandits. We obtain an improvement in the additive terms of the regret bound compared to state of the art algorithms (the additive terms are dominating up to time horizons which are exponential in the number of arms).

该研究介绍了因式赌博模型，它是一种基于有限（赌博）反馈的学习框架，其中行动可以分解为原子行动的笛卡尔积。因式赌博将等级1赌博作为一个特例，但显着放宽了奖励函数形式的假设。我们提供了一种随时随地的随机因式赌博算法，并匹配了问题的上界和下界的常数。此外，我们表明，通过轻微修改，所提出的算法可以应用于效用基础的反复决斗赌徒。相对于现有算法，我们在遗憾边界的附加项方面获得了改进（这些附加项在时间范围内占支配地位，该时间范围呈指数增长）

分解赌博机