We propose a novel differentially private algorithm for online federated
learning that employs temporally correlated noise to improve the utility while
ensuring the privacy of the continuously released models. To address challenges
stemming from DP noise and local updates with streaming noniid data, we develop
a perturbed iterate analysis to control the impact of the DP noise on the
utility. Moreover, we demonstrate how the drift errors from local updates can
be effectively managed under a quasi-strong convexity condition. Subject to an
$(\epsilon, \delta)$-DP budget, we establish a dynamic regret bound over the
entire time horizon that quantifies the impact of key parameters and the
intensity of changes in dynamic environments. Numerical experiments validate
the efficacy of the proposed algorithm.

我们提出了一种新颖的差分隐私算法，用于在线联合学习，通过使用时间相关的噪声来提高效用并确保连续发布的模型的隐私性。

具有相关噪声的差分隐私在线联邦学习

Differentially Private Online Federated Learning with Correlated Noise

In this work, we consider a sequence of stochastic optimization problems
following a time-varying distribution via the lens of online optimization.
Assuming that the loss function satisfies the Polyak-{\L}ojasiewicz condition,
we apply online stochastic gradient descent and establish its dynamic regret
bound that is composed of cumulative distribution drifts and cumulative
gradient biases caused by stochasticity. The distribution metric we adopt here
is Wasserstein distance, which is well-defined without the absolute continuity
assumption or with a time-varying support set. We also establish a regret bound
of online stochastic proximal gradient descent when the objective function is
regularized. Moreover, we show that the above framework can be applied to the
Conditional Value-at-Risk (CVaR) learning problem. Particularly, we improve an
existing proof on the discovery of the PL condition of the CVaR problem,
resulting in a regret bound of online stochastic gradient descent.

在本文中，我们通过在线优化的视角，考虑了一个遵循随时间变化的分布的随机优化问题序列。假设损失函数满足 Polyak-Lojasiewicz 条件，我们应用在线随机梯度下降并建立了其动态遗憾界，其中包含由随机性引起的累积分布漂移和累积梯度偏差。我们采用的分布测度是 Wasserstein 距离，它在没有绝对连续性假设或具有时变支持集时具有良好定义。我们还建立了在线随机近端梯度下降的遗憾界，当目标函数被正则化时。此外，我们展示了上述框架如何应用于条件风险价值（CVaR）学习问题。特别地，我们改进了对 CVaR 问题 PL 条件发现的现有证明，从而得到了在线随机梯度下降的遗憾界。

分布时变在线随机优化在条件风险价值统计学习中的应用

Distributionally Time-Varying Online Stochastic Optimization under  Polyak-Łojasiewicz Condition with Application in Conditional Value-at-Risk  Statistical Learning

We consider primal-dual-based reinforcement learning (RL) in episodic
constrained Markov decision processes (CMDPs) with non-stationary objectives
and constraints, which plays a central role in ensuring the safety of RL in
time-varying environments. In this problem, the reward/utility functions and
the state transition functions are both allowed to vary arbitrarily over time
as long as their cumulative variations do not exceed certain known variation
budgets. Designing safe RL algorithms in time-varying environments is
particularly challenging because of the need to integrate the constraint
violation reduction, safe exploration, and adaptation to the non-stationarity.
To this end, we identify two alternative conditions on the time-varying
constraints under which we can guarantee the safety in the long run. We also
propose the \underline{P}eriodically \underline{R}estarted
\underline{O}ptimistic \underline{P}rimal-\underline{D}ual \underline{P}roximal
\underline{P}olicy \underline{O}ptimization (PROPD-PPO) algorithm that can
coordinate with both two conditions. Furthermore, a dynamic regret bound and a
constraint violation bound are established for the proposed algorithm in both
the linear kernel CMDP function approximation setting and the tabular CMDP
setting under two alternative conditions. This paper provides the first
provably efficient algorithm for non-stationary CMDPs with safe exploration.

本文研究了具有不稳定目标和约束的约束马尔可夫决策过程的原始 - 对偶强化学习，并提出了具有安全性和适应性的时间变化中安全的 RL 算法，同时建立了动态遗憾界和约束违规界。