We study the generalized linear contextual bandit problem within the
requirements of limited adaptivity. In this paper, we present two algorithms,
\texttt{B-GLinCB} and \texttt{RS-GLinCB}, that address, respectively, two
prevalent limited adaptivity models: batch learning with stochastic contexts
and rare policy switches with adversarial contexts. For both these models, we
establish essentially tight regret bounds. Notably, in the obtained bounds, we
manage to eliminate a dependence on a key parameter $\kappa$, which captures
the non-linearity of the underlying reward model. For our batch learning
algorithm \texttt{B-GLinCB}, with $\Omega\left( \log{\log T} \right)$ batches,
the regret scales as $\tilde{O}(\sqrt{T})$. Further, we establish that our
rarely switching algorithm \texttt{RS-GLinCB} updates its policy at most
$\tilde{O}(\log^2 T)$ times and achieves a regret of $\tilde{O}(\sqrt{T})$. Our
approach for removing the dependence on $\kappa$ for generalized linear
contextual bandits might be of independent interest.

我们在有限适应性的条件下研究广义线性情境赌博问题。我们提出了两种算法分别解决两种普遍存在的有限适应性模型：具有随机情境的批量学习和具有对抗情境的罕见策略切换。对于这些模型，我们建立了本质上紧密的遗憾上界。值得注意的是，在我们获得的上界中，我们成功消除了关键参数 kappa 的依赖性，该参数捕捉到底层奖励模型的非线性。对于我们的批量学习算法 B-GLinCB，使用 Ω(log (log T)) 批次，遗憾的规模为 Φ(O (√T)). 此外，我们建立了我们的罕见切换算法 RS-GLinCB 最多更新策略 Φ(O (log^2 T)) 次，并实现了 Φ(O (√T)) 的遗憾。我们消除广义线性情景赌博对 kappa 的依赖的方法可能具有独立的兴趣。