We study a generalization of the classic Online Convex Optimization (OCO)
framework by considering additional long-term adversarial constraints.
Specifically, after an online policy decides its action on a round, in addition
to a convex cost function, the adversary also reveals a set of $k$ convex
constraints. The cost and the constraint functions could change arbitrarily
with time, and no information about the future functions is assumed to be
available. In this paper, we propose a meta-policy that simultaneously achieves
a sublinear cumulative constraint violation and a sublinear regret. This is
achieved via a black box reduction of the constrained problem to the standard
OCO problem for a recursively constructed sequence of surrogate cost functions.
We show that optimal performance bounds can be achieved by solving the
surrogate problem using any adaptive OCO policy enjoying a standard
data-dependent regret bound. A new Lyapunov-based proof technique is presented
that reveals a connection between regret and certain sequential inequalities
through a novel decomposition result. We conclude the paper by highlighting
applications to online multi-task learning and network control problems.

我们研究了经典的在线凸优化（OCO）框架的一种推广，通过考虑额外的长期对抗性约束。我们提出了一种元策略，能够同时达到亚线性的累积约束违规和亚线性的遗憾，通过将约束问题转化为递归构建的一系列代理代价函数的标准 OCO 问题的黑盒减缩。我们展示了通过使用任何享有标准数据相关遗憾上界的自适应 OCO 策略求解代理问题，可以达到最优性能界限。通过一种新的基于李雅普诺夫的证明技术，我们揭示了遗憾和某些顺序不等式之间的联系，通过一种新颖的分解结果。最后，我们强调了在在线多任务学习和网络控制问题中的应用。