We study the generalized linear contextual bandit problem within the requirements of limited adaptivity. In this paper, we present two algorithms, \texttt{B-GLinCB} and \texttt{RS-GLinCB}, that address, respectively, two prevalent limited adaptivity models: batch learning with stochastic contexts and rare policy switches with adversarial contexts. For both these models, we establish essentially tight regret bounds. Notably, in the obtained bounds, we manage to eliminate a dependence on a key parameter $\kappa$, which captures the non-linearity of the underlying reward model. For our batch learning algorithm \texttt{B-GLinCB}, with $\Omega\left( \log{\log T} \right)$ batches, the regret scales as $\tilde{O}(\sqrt{T})$. Further, we establish that our rarely switching algorithm \texttt{RS-GLinCB} updates its policy at most $\tilde{O}(\log^2 T)$ times and achieves a regret of $\tilde{O}(\sqrt{T})$. Our approach for removing the dependence on $\kappa$ for generalized linear contextual bandits might be of independent interest.

我们在有限适应性的条件下研究广义线性情境赌博问题。我们提出了两种算法分别解决两种普遍存在的有限适应性模型：具有随机情境的批量学习和具有对抗情境的罕见策略切换。对于这些模型，我们建立了本质上紧密的遗憾上界。值得注意的是，在我们获得的上界中，我们成功消除了关键参数kappa的依赖性，该参数捕捉到底层奖励模型的非线性。对于我们的批量学习算法B-GLinCB，使用Ω(log(log T))批次，遗憾的规模为Φ(O(√T)).此外，我们建立了我们的罕见切换算法RS-GLinCB最多更新策略Φ(O(log^2 T))次，并实现了Φ(O(√T))的遗憾。我们消除广义线性情景赌博对kappa的依赖的方法可能具有独立的兴趣。

广义线性背景臂机情境下的有限适应度最优遗憾