A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form $O(K \log T)$ but require restrictive assumptions, or offer bounds of the form $O(K^2 \log T)$ without requiring such assumptions. Our results offer the best of both worlds: $O(K \log T)$ bounds without restrictive assumptions.

研究提出了两个算法以在Condorcet winner不存在的情况下解决dueling bandit问题。这些算法寻求最小化与Copeland winner相关的遗憾，Copeland winner与Condorcet winner不同的是，它是有保障的存在。第一个算法CCB适用于少量的arms，第二个算法SCB在大规模问题上表现更好。该研究提供了理论结果以界定CCB和SCB所积累的遗憾。这些结果大幅度改善了现有结果，并且没有附带限制性假设，提供了O(K log T)的最佳结果。

Copeland对立双臂赌博算法