Bandit convex optimization (BCO) is a general framework for online decision making under uncertainty. While tight regret bounds for general convex losses have been established, existing algorithms achieving these bounds have prohibitive computational costs for high dimensional data. In this paper, we propose a simple and practical BCO algorithm inspired by the online Newton step algorithm. We show that our algorithm achieves optimal (in terms of horizon) regret bounds for a large class of convex functions that we call $\kappa$-convex. This class contains a wide range of practically relevant loss functions including linear, quadratic, and generalized linear models. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression. Furthermore, we investigate the adaptation of our second-order bandit algorithm to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an algorithm with optimal regret for bandit LQR/LQG problems under a fully adversarial noise model, thereby resolving an open question posed in \citep{gradu2020non} and \citep{sun2023optimal}. Finally, we show that the more general problem of BCO with (non-affine) memory is harder. We derive a $\tilde{\Omega}(T^{2/3})$ regret lower bound, even under the assumption of smooth and quadratic losses.

本文介绍了一种简单且实用的在线牛顿步骤算法，该算法在一类称为κ-凸的凸函数中具有最优（以时间长度衡量）的遗憾界，并且在包括线性、二次和广义线性模型在内的广泛实际损失函数中为最高效的已知方法。此外，我们研究了我们的二阶赌博算法在具有一定仿射结构的损失函数中适应在线凸优化，我们证明了延伸算法达到最优遗憾界，从而解决了在gradu2020non和sun2023optimal中提出的一个开放问题，即完全敌对噪声模型下的赌博LQR/LQG问题。最后，我们证明了BCO与（非仿射）内存的更一般问题更难，在光滑且二次损失的假设下，导出了一个T^{2/3}遗憾界的下界。

二阶方法在赌局优化和控制中的应用