This paper considers contextual bandits with a finite number of arms, where the contexts are independent and identically distributed $d$-dimensional random vectors, and the expected rewards are linear in both the arm parameters and contexts. The LinUCB algorithm, which is near minimax optimal for related linear bandits, is shown to have a cumulative regret that is suboptimal in both the dimension $d$ and time horizon $T$, due to its over-exploration. A truncated version of LinUCB is proposed and termed "Tr-LinUCB", which follows LinUCB up to a truncation time $S$ and performs pure exploitation afterwards. The Tr-LinUCB algorithm is shown to achieve $O(d\log(T))$ regret if $S = Cd\log(T)$ for a sufficiently large constant $C$, and a matching lower bound is established, which shows the rate optimality of Tr-LinUCB in both $d$ and $T$ under a low dimensional regime. Further, if $S = d\log^{\kappa}(T)$ for some $\kappa>1$, the loss compared to the optimal is a multiplicative $\log\log(T)$ factor, which does not depend on $d$. This insensitivity to overshooting in choosing the truncation time of Tr-LinUCB is of practical importance.

研究此论文中的上下文臂带，其中上下文是独立且恒定分布的d维随机向量，期望回报在臂参数和上下文中都是线性的；提出了一种截断版的LinUCB算法，称为Tr-LinUCB，其在截断时间S之前遵循LinUCB，在之后进行纯粹的开发，S=Cd log（T）时达到O（d log（T））的遗憾，如果S = d log（T）的某个升幂，则相对于最优解的损失是费用为loglog（T）的乘法，这种对超调敏感的Tr-LinUCB算法的实用重要性。

截断LinUCB算法用于随机线性赌臂问题