We present improved algorithms with worst-case regret guarantees for the
stochastic linear bandit problem. The widely used "optimism in the face of
uncertainty" principle reduces a stochastic bandit problem to the construction
of a confidence sequence for the unknown reward function. The performance of
the resulting bandit algorithm depends on the size of the confidence sequence,
with smaller confidence sets yielding better empirical performance and stronger
regret guarantees. In this work, we use a novel tail bound for adaptive
martingale mixtures to construct confidence sequences which are suitable for
stochastic bandits. These confidence sequences allow for efficient action
selection via convex programming. We prove that a linear bandit algorithm based
on our confidence sequences is guaranteed to achieve competitive worst-case
regret. We show that our confidence sequences are tighter than competitors,
both empirically and theoretically. Finally, we demonstrate that our tighter
confidence sequences give improved performance in several hyperparameter tuning
tasks.

我们提出了一种改进的算法，可保证在最坏情况下减少后悔，以解决随机线性强盗问题。

利用鞍点边界来改进随机线性赌臂算法的新算法

Improved Algorithms for Stochastic Linear Bandits Using Tail Bounds for  Martingale Mixtures

In this paper, we study the stochastic linear bandit problem under the
additional requirements of differential privacy, robustness and batched
observations. In particular, we assume an adversary randomly chooses a constant
fraction of the observed rewards in each batch, replacing them with arbitrary
numbers. We present differentially private and robust variants of the arm
elimination algorithm using logarithmic batch queries under two privacy models
and provide regret bounds in both settings. In the first model, every reward in
each round is reported by a potentially different client, which reduces to
standard local differential privacy (LDP). In the second model, every action is
"owned" by a different client, who may aggregate the rewards over multiple
queries and privatize the aggregate response instead. To the best of our
knowledge, our algorithms are the first simultaneously providing differential
privacy and adversarial robustness in the stochastic linear bandits problem.

本研究使用对数批量查询和不同的隐私模型提出不同关于武断攻击的差分隐私和鲁棒性阿姆淘汰算法，实现同时在随机线性医生问题中提供差分隐私和对手强度的功能，并提供相应的遗憾界限。

鲁棒和差分隐私随机线性赌博机

Robust and differentially private stochastic linear bandits

We consider a stochastic linear bandit problem in which the rewards are not
only subject to random noise, but also adversarial attacks subject to a
suitable budget $C$ (i.e., an upper bound on the sum of corruption magnitudes
across the time horizon). We provide two variants of a Robust Phased
Elimination algorithm, one that knows $C$ and one that does not. Both variants
are shown to attain near-optimal regret in the non-corrupted case $C = 0$,
while incurring additional additive terms respectively having a linear and
quadratic dependency on $C$ in general. We present algorithm independent lower
bounds showing that these additive terms are near-optimal. In addition, in a
contextual setting, we revisit a setup of diverse contexts, and show that a
simple greedy algorithm is provably robust with a near-optimal additive regret
term, despite performing no explicit exploration and not knowing $C$.

研究了随机线性赌博机问题，考虑了对抗攻击，提出了两种 Robust Phased Elimination 算法，证明了在非污染情况下可以获得近似最优的收益，并得出针对这些算法的相对近似最优的加性项。同时，在具有多样化情境的情况下，表明一种简单的贪婪算法是稳健的，近似最优的加性遗憾项，尽管不进行明确的探索并且不知道 C。

抗对抗攻击的随机线性臂机算法

Stochastic Linear Bandits Robust to Adversarial Attacks

The stochastic linear bandit problem proceeds in rounds where at each round
the algorithm selects a vector from a decision set after which it receives a
noisy linear loss parameterized by an unknown vector. The goal in such a
problem is to minimize the (pseudo) regret which is the difference between the
total expected loss of the algorithm and the total expected loss of the best
fixed vector in hindsight. In this paper, we consider settings where the
unknown parameter has structure, e.g., sparse, group sparse, low-rank, which
can be captured by a norm, e.g., $L_1$, $L_{(1,2)}$, nuclear norm. We focus on
constructing confidence ellipsoids which contain the unknown parameter across
all rounds with high-probability. We show the radius of such ellipsoids depend
on the Gaussian width of sets associated with the norm capturing the structure.
Such characterization leads to tighter confidence ellipsoids and, therefore,
sharper regret bounds compared to bounds in the existing literature which are
based on the ambient dimensionality.

研究如何在处理具有结构属性的未知参数（例如稀疏、分组稀疏、低秩）的随机线性 Bandit 问题中构建置信椭圆，以达到更紧密的置信度范围和更尖锐的失误边界。