A well-studied generalization of the standard online convex optimization
(OCO) is constrained online convex optimization (COCO). In COCO, on every
round, a convex cost function and a convex constraint function are revealed to
the learner after the action for that round is chosen. The objective is to
design an online policy that simultaneously achieves a small regret while
ensuring small cumulative constraint violation (CCV) against an adaptive
adversary. A long-standing open question in COCO is whether an online policy
can simultaneously achieve $O(\sqrt{T})$ regret and $O(\sqrt{T})$ CCV without
any restrictive assumptions. For the first time, we answer this in the
affirmative and show that an online policy can simultaneously achieve
$O(\sqrt{T})$ regret and $\tilde{O}(\sqrt{T})$ CCV. We establish this result by
effectively combining the adaptive regret bound of the AdaGrad algorithm with
Lyapunov optimization - a classic tool from control theory. Surprisingly, the
analysis is short and elegant.

有关在线凸优化和约束在线凸优化的一篇研究论文，证明了一个在线策略可以同时实现 O (√T) 的遗憾和 θ̃(√T) 的累积约束违规，通过将 AdaGrad 算法的自适应遗憾界与 Lyapunov 优化相结合，达到了这一结果。

具有对抗约束的在线凸优化的严格界

Tight Bounds for Online Convex Optimization with Adversarial Constraints

This paper considers stochastic-constrained stochastic optimization where the
stochastic constraint is to satisfy that the expectation of a random function
is below a certain threshold. In particular, we study the setting where data
samples are drawn from a Markov chain and thus are not independent and
identically distributed. We generalize the drift-plus-penalty framework, a
primal-dual stochastic gradient method developed for the i.i.d. case, to the
Markov chain sampling setting. We propose two variants of drift-plus-penalty;
one is for the case when the mixing time of the underlying Markov chain is
known while the other is for the case of unknown mixing time. In fact, our
algorithms apply to a more general setting of constrained online convex
optimization where the sequence of constraint functions follows a Markov chain.
Both algorithms are adaptive in that the first works without knowledge of the
time horizon while the second uses AdaGrad-style algorithm parameters, which is
of independent interest. We demonstrate the effectiveness of our proposed
methods through numerical experiments on classification with fairness
constraints.

本文研究了基于 Markov 链采样的随机约束随机优化问题，将 drift-plus-penalty 方法推广至这一设置，提出了两种变体，分别适用于已知和未知混合时间的情况，同时适用于约束函数序列遵循 Markov 链的一般设置，通过在分类中引入公平约束的数值实验证明了我们所提方法的有效性。

具有马尔可夫数据的随机约束随机优化

Stochastic-Constrained Stochastic Optimization with Markovian Data

In this paper we propose a framework for solving constrained online convex
optimization problem. Our motivation stems from the observation that most
algorithms proposed for online convex optimization require a projection onto
the convex set $\mathcal{K}$ from which the decisions are made. While for
simple shapes (e.g. Euclidean ball) the projection is straightforward, for
arbitrary complex sets this is the main computational challenge and may be
inefficient in practice. In this paper, we consider an alternative online
convex optimization problem. Instead of requiring decisions belong to
$\mathcal{K}$ for all rounds, we only require that the constraints which define
the set $\mathcal{K}$ be satisfied in the long run. We show that our framework
can be utilized to solve a relaxed version of online learning with side
constraints addressed in \cite{DBLP:conf/colt/MannorT06} and
\cite{DBLP:conf/aaai/KvetonYTM08}. By turning the problem into an online
convex-concave optimization problem, we propose an efficient algorithm which
achieves $\tilde{\mathcal{O}}(\sqrt{T})$ regret bound and
$\tilde{\mathcal{O}}(T^{3/4})$ bound for the violation of constraints. Then we
modify the algorithm in order to guarantee that the constraints are satisfied
in the long run. This gain is achieved at the price of getting
$\tilde{\mathcal{O}}(T^{3/4})$ regret bound. Our second algorithm is based on
the Mirror Prox method \citep{nemirovski-2005-prox} to solve variational
inequalities which achieves $\tilde{\mathcal{\mathcal{O}}}(T^{2/3})$ bound for
both regret and the violation of constraints when the domain $\K$ can be
described by a finite number of linear constraints. Finally, we extend the
result to the setting where we only have partial access to the convex set
$\mathcal{K}$ and propose a multipoint bandit feedback algorithm with the same
bounds in expectation as our first algorithm.

本文提出了解决约束在线凸优化问题的框架。通过将问题转化为在线凸 - 凹优化问题，提出了一种有效的算法，可以实现收敛性较好的结果。同时，本文还为从中提取多点强化信号的约束在线凸优化问题提供了解决方案。